Research

My research has mostly revolved around the general topic of deep reinforcement learning (RL). Currently, I am interested in investigating the adversarial robustness of reinforcement learning algorithms in order to stabilize deep RL training as well as improve generalization.

Reliable Reinforcement Learning

Replicable Reinforcement Learning

Replicability has proven to be a key concern in the field of machine learning and especially reinforcement learning. In this paper, we introduce a formal notion of Replicable Reinforcement Learning. A Replicable Reinforcement Learning algorithm produces (whp.) identical policies, value functions or MDPs across two runs of the same algorithm. In our work we provide a first set of formally replicable algorithms for sample-based value iteration and exploration. First, we assume the generative model setting with access to a parallel sampling sub-routine. In this setting, our first algorithm called Replicable Phased Value Iteration replicably produces the exact same value function across two runs. Then, we consider the episodic setting where an agent needs to explore the environment. Our Rep. Episodic R-Max finds a sequence of replicable known state-actions pairs to compute identical MDPs across different runs showing that exploration can be done replicably. We also experimentally validate our algorithms. We show they do have some sample-complexity overhead but it is not as large as theory would suggest.

Structure in Reinforcement Learning Agents

Composuite: A Compositional Reinforcement Learning Benchmark

In my first paper as a PhD student, we created a benchmark called CompoSuite to analyze the compositionality of RL algorithms. CompoSuite consists of 256 distinct tasks that are designed compositionally. Each task requires the agent to use a robot arm to manipulate an object and achieve an objective while avoiding an obstacle. To enable RL training, each task comes with shaped reward functions. In CompoSuite, reasoning via composition should enable an agent to transfer knowledge across tasks. We show that a compositional learner is able to learn all possible tasks and can zero-shot transfer to unseen tasks while a classical multi-task learner has trouble generalizing.

Structured Object-Aware State Representation Learning

For my Master’s thesis, I worked on object-aware state representations for deep (RL). We used a dynamics prediction model to extract concrete positions and velocities from visual observations. Positions and velocities are encoded in a structured latent space and used for world-model prediction. We showed how these structured representations can be used for planning and model-based RL and how they can speed up model-free reinforcement learning training. Part of the work conducted during the thesis was then published in our work on Structured Object-Aware Physics Prediction for Video Modeling and Planning at ICLR 2020.

Deep and Hierarchical Reinforcement Learning

Learning to play StarCraft II

During my Master’s degree, I first got in contact with the subject of RL in a research project at Jan Peter’s lab. We created an open-source repository to reproduce the results of the first paper learning to play parts of the real-time strategy video game StarCraft II (SC2)1. The code and report are available here. In a separate strain of follow-up work, we extended the implementation to include the then-new Proximal Policy Optimization2 algorithm which had not been tested on SC2. Additionally, we extended the Feudal Networks3 neural network architecture with spatially aware actions in the hopes of learning priors that can help transfer policy across games. The code with the extended content as well as the report can be found here.

Here is a video of the original A3C agent trained using our implementation :point_down:.

  1. StarCraft II: A New Challenge for Reinforcement Learning. Oriol Vinyals, et al. 2017. 

  2. Proximal Policy Optimization Algorithms. John Schulman, et al. 2017. 

  3. FeUdal Networks for Hierarchical Reinforcement Learning. Alexander Sasha Vezhnevets, et al. 2017.