endtoendAI

GSoC TensorFlow Part 2: Improving Documentation

 reinforcement-learning  gsoc

A great way to learn the material is to make modifications. This week, I summarize my experience of creating a pull request to TF-Agents to improve its documentation.

RL Weekly 18: Survey of Domain Randomization Techniques for Sim-to-Real Transfer, and Evaluating Deep RL with ToyBox

 reinforcement-learning  rl-weekly

This week, we introduce a survey of Domain Randomization Techniques for Sim-to-Real Transfer and ToyBox, a suite of redesigned Atari Environments for experimental evaluation of deep RL.

GSoC TensorFlow Part 1: Setting Up TF-Agents

 reinforcement-learning  gsoc

I have been accepted to Google Summer of Code program to work on TensorFlow for three months. I will be working on TensorFlow's reinforcement learning library TF-Agents. In this post, I briefly summmarize the steps I took to setup the TF-Agents environment for future reference.

RL Weekly 17: Information Asymmetry in KL-regularized Objective, Real-world Challenges to RL, and Fast and Slow RL

 reinforcement-learning  rl-weekly

In this issue, we summarize the use of information asymmetry in KL regularized objective to regularize the policy, the challenges of deploying deep RL into real-world systems, and possible insights into psychology and neuroscience from deep RL.

Using TensorBoard with PyTorch 1.1.0

 pytorch

With PyTorch 1.1.0, tensorboard is now natively supported in PyTorch. This post contains detailed instuctions to install tensorboard.

Collapsible Code Blocks in GitHub Pages

 blogging

Here is a quick guide on using collapsible code blocks in GitHub pages. This might be useful when there is a large output that might be useful only to a few people reading the post.

RL Weekly 16: Why Performance Plateaus May Occur, and Compressing DQNs

 reinforcement-learning  rl-weekly

In this issue, we introduce 'ray interference,' a possible cause of performance plateaus in deep reinforcement learning conjectured by Google DeepMind. We also introduce a network distillation method proposed by researchers at Carnegie Mellon University.

RL Weekly 15: Learning without Rewards: from Active Queries or Suboptimal Demonstrations

 reinforcement-learning  rl-weekly

In this issue, we introduce VICE-RAQ by UC Berkeley and T-REX by UT Austin and Preferred Networks. VICE-RAQ trains a classifier to infer rewards from goal examples and active querying. T-REX learns reward functions from suboptimal demonstrations ranked by humans.

RL Weekly 14: OpenAI Five and Berkeley Blue

 reinforcement-learning  rl-weekly

In this week's issue, we summarize the Dota 2 match between OpenAI Five and OG eSports and introduce Blue, a new low-cost robot developed by the Robot Learning Lab at UC Berkeley.

RL Weekly 13: Learning to Toss, Learning to Paint, and How to Explain RL

 reinforcement-learning  rl-weekly

In this week's issue, we summarize results from Princeton, Google, Columbia, and MIT on training a robot arm to throw objects. We also look at a model-based DDPG developed by Peking University and Megvii that can reproduce pictures through paint strokes. Finally, we look at an empirical study by Oregon State University about explaining RL to layman.

RL Weekly 12: Atari Demos with Human Gaze Labels, New SOTA in Meta-RL, and a Hierarchical Take on Intrinsic Rewards

 reinforcement-learning  rl-weekly

This week, we look at a new demo dataset of Atari games that include trajectories and human gaze. We also look at PEARL, a new meta-RL method that boasts sample efficiency and performance superior to previous state-of-the-art algorithms. Finally, we look at a novel method of incorporating intrinsic rewards.

RL Weekly 11: The Bitter Lesson by Richard Sutton, the Promise of Hierarchical RL, and Exploration with Human Feedback

 reinforcement-learning  rl-weekly

In this issue, we first look at a diary entry by Richard S. Sutton (DeepMind, UAlberta) on Compute versus Clever. Then, we look at a post summarizing Hierarchical RL by Yannis Flet-Berliac (INRIA SequeL). Finally, we summarize a paper incorporating human feedback for exploration from Delft University of Technology.

RL Weekly 10: Learning from Playing, Understanding Multi-agent Intelligence, and Navigating in Google Street View

 reinforcement-learning  rl-weekly

In this issue, we look at Google Brain's algorithm of learning by playing, DeepMind's thoughts on multi-agent intelligence, and DeepMind's new navigation environment using Google Street View data.

RL Weekly 9: Sample-efficient Near-SOTA Model-based RL, Neural MMO, and Bottlenecks in Deep Q-Learning

 reinforcement-learning  rl-weekly

In this issue, we look at SimPLe, a model-based RL algorithm that achieves near-state-of-the-art results on Arcade Learning Environments (ALE). We also look at Neural MMO, a new multiagent environment by OpenAI, and an empirical analysis of possible sources of error in deep Q-learning by BAIR.

RL Weekly 8: World Discovery Models, MuJoCo Soccer Environment, and Deep Planning Network

 reinforcement-learning  rl-weekly

In this issue, we introduce World Discovery Models and MuJoCo Soccer Environment from Google DeepMind, and PlaNet from Google.

Obstacle Tower 6: Submitting a Random Agent

 reinforcement-learning  obstacle-tower  competition

We submit a random agent to the Obstacle Tower Challenge that just began.

RL Weekly 7: Obstacle Tower Challenge, Hanabi Learning Environment, and Spinning Up Workshop

 reinforcement-learning  rl-weekly

This week, we introduce the Obstacle Tower Challenge, a new RL competition by Unity, Hanabi Learning Environment, a multi-agent environment by DeepMind, and Spinning Up Workshop, a workshop hosted by OpenAI.

Obstacle Tower 5: Possible Improvements to the Baselines

 reinforcement-learning  obstacle-tower  competition

We play the Obstacle Tower game to understand the qualities of a successful agent.

Obstacle Tower 4: Understanding the Baselines

 reinforcement-learning  obstacle-tower  competition

We briefly introduce Rainbow and PPO, the two baselines that was tested on Obstacle Tower.

Slow Papers: The Obstacle Tower: A Generalization Challenge in Vision, Control, and Planning (Juliani et al., 2019)

 reinforcement-learning  slowpapers

The rapid pace of research development in Deep Reinforcement Learning has been driven by the presence of fast and challenging simulation environments. These environments often take the form of video games, such as the Atari games provided in the Arcade Learning Environment (ALE). In the past year, however, significant progress has been made in achieving superhuman performance on even the most difficult and heavily studied game in the ALE: Montezumas Revenge. We propose a new benchmark environment, Obstacle Tower: a high visual fidelity, 3D, 3rd person, procedurally generated environment. An agent in the Obstacle Tower must learn to solve both low level control and high-level planning problems in tandem learning from pixels and a sparse reward signal in order to make it as high as possible up the tower. In this paper we outline the environment and provide a set of initial baseline results using current state of the art Deep RL methods as well as human players. In all cases these algorithms fail to produce agents capable of performing anywhere near human level on a set of evaluations designed to test both memorization and generalization ability. As such, we believe that the Obstacle Tower has the potential to serve as a helpful Deep RL benchmark now and into the future.

Fast Papers: Neural Episodic Control (Pritzel et al., 2017)

 reinforcement-learning  fastpapers

Deep reinforcement learning methods attain super-human performance in a wide range of environments. Such methods are grossly inefficient, often taking orders of magnitudes more data than humans to achieve reasonable performance. We propose Neural Episodic Control: a deep reinforcement learning agent that is able to rapidly assimilate new experiences and act upon them. Our agent uses a semi-tabular representation of the value function: a buffer of past experience containing slowly changing state representations and rapidly updated estimates of the value function. We show across a wide range of environments that our agent learns significantly faster than other state-of-the-art, general purpose deep reinforcement learning agents.

Fast Papers: The Obstacle Tower: A Generalization Challenge in Vision, Control, and Planning (Juliani et al., 2019)

 reinforcement-learning  fastpapers

The rapid pace of research development in Deep Reinforcement Learning has been driven by the presence of fast and challenging simulation environments. These environments often take the form of video games, such as the Atari games provided in the Arcade Learning Environment (ALE). In the past year, however, significant progress has been made in achieving superhuman performance on even the most difficult and heavily studied game in the ALE: Montezumas Revenge. We propose a new benchmark environment, Obstacle Tower: a high visual fidelity, 3D, 3rd person, procedurally generated environment. An agent in the Obstacle Tower must learn to solve both low level control and high-level planning problems in tandem learning from pixels and a sparse reward signal in order to make it as high as possible up the tower. In this paper we outline the environment and provide a set of initial baseline results using current state of the art Deep RL methods as well as human players. In all cases these algorithms fail to produce agents capable of performing anywhere near human level on a set of evaluations designed to test both memorization and generalization ability. As such, we believe that the Obstacle Tower has the potential to serve as a helpful Deep RL benchmark now and into the future.

Obstacle Tower 3: Observation Space and Action Space

 reinforcement-learning  obstacle-tower  competition

We analyze the observation space and the action space provided by the Obstacle Tower environment.

Obstacle Tower 2: Playing the Game

 reinforcement-learning  obstacle-tower  competition

We play the Obstacle Tower game to understand the qualities of a successful agent.

Obstacle Tower 1: Installing the Environment

 reinforcement-learning  obstacle-tower  competition

Unity introduced the Obstacle Tower Challenge, a new reinforcement learning contest with a difficult environment. In this post, we guide the readers on installing the environment on Linux using conda.

RL Weekly 6: AlphaStar, Rectified Nash Response, and Causal Reasoning with Meta RL

 reinforcement-learning  rl-weekly

This week, we look at AlphaStar, a Starcraft II AI, PSRO_rN, an evaluation algorithm encouraging diverse population of well-trained agents, and a novel Meta-RL approach for causal reasoning. All three results are from DeepMind.

Deep RL Seminar Week 2: Deep Q-Networks

 reinforcement-learning  deep-rl-seminar

This week, we reviewed various improvements made to the Deep Q-Network algorithm.

RL Weekly 5: Robust Control of Legged Robots, Compiler Phase-Ordering, and Go Explore on Sonic the Hedgehog

 reinforcement-learning  rl-weekly

This week, we look at impressive robust control of legged robots by ETH Zurich and Intel, compiler phase-ordering by UC Berkeley and MIT, and a partial implementation of Uber's Go Explore.

RL Weekly 4: Generating Problems with Solutions, Optical Flow with RL, and Model-free Planning

 reinforcement-learning  rl-weekly

In this issue, we introduce new curriculum learning algorithm by Uber AI Labs, model-free planning algorithm by DeepMind, and optical-flow based control algorithm by Intel Labs and University of Freiburg.

RL Weekly 3: Learning to Drive through Dense Traffic, Learning to Walk, and Summarizing Progress in Sim-to-Real

 reinforcement-learning  rl-weekly

In this issue, we introduce the DeepTraffic competition from Lex Fridman's MIT Deep Learning for Self-Driving Cars course. We also review a new paper on using SAC to control a four-legged robot, and introduce a website summarizing progress in sim-to-real algorithms.

PyTorch Implementations of Policy Gradient Methods

 reinforcement-learning  policy-gradient  pytorch

A well-written baseline is crucial to research. We compare and recommend popular open source implementations of reinforcement learning algorithms in PyTorch.

RL Weekly 2: Tuning AlphaGo, Macro-strategy for MOBA, Sim-to-Real with conditional GANs

 reinforcement-learning  rl-weekly

In this issue, we discuss hyperparameter tuning for AlphaGo from DeepMind, Hierarchical RL model for a MOBA game from Tencent, and GAN-based Sim-to-Real algorithm from X, Google Brain, and DeepMind.

RL Weekly 1: Soft Actor-Critic Code Release; Text-based RL Competition; Learning with Training Wheels

 reinforcement-learning  rl-weekly

In this inaugural issue of the RL Weekly newsletter, we discuss Soft Actor-Critic (SAC) from BAIR, the new TextWorld competition by Microsoft Research, and AsDDPG from University of Oxford and Heriot-Watt University.

Slow Papers: Exploration by Random Network Distillation (Burda et al., 2018)

 reinforcement-learning  slowpapers

We introduce an exploration bonus for deep reinforcement learning methods that is easy to implement and adds minimal overhead to the computation performed. The bonus is the error of a neural network predicting features of the observations given by a fixed randomly initialized neural network. We also introduce a method to flexibly combine intrinsic and extrinsic rewards. We find that the random network distillation (RND) bonus combined with this increased flexibility enables significant progress on several hard exploration Atari games. In particular we establish state of the art performance on Montezuma's Revenge, a game famously difficult for deep reinforcement learning methods. To the best of our knowledge, this is the first method that achieves better than average human performance on this game without using demonstrations or having access the underlying state of the game, and occasionally completes the first level. This suggests that relatively simple methods that scale well can be sufficient to tackle challenging exploration problems.

Slow Papers: A Deeper Look at Experience Replay (Zhang and Sutton, 2017)

 reinforcement-learning  slowpapers

Recently experience replay is widely used in various deep reinforcement learning (RL) algorithms, in this paper we rethink the utility of experience replay. It introduces a new hyper-parameter, the memory buffer size, which needs carefully tuning. However unfortunately the importance of this new hyper-parameter has been underestimated in the community for a long time. In this paper we did a systematic empirical study of experience replay under various function representations. We showcase that a large replay buffer can significantly hurt the performance. Moreover, we propose a simple O(1) method to remedy the negative influence of a large replay buffer. We showcase its utility in both simple grid world and challenging domains like Atari games.

Slow Papers: Neural Fitted Q Iteration (Riedmiller, 2005)

 reinforcement-learning  slowpapers

This paper introduces NFQ, an algorithm for efficient and effective training of a Q-value function represented by a multi-layer perceptron. Based on the principle of storing and reusing transition experiences, a model-free, neural network based RL algorithm is proposed. The method is evaluated on three benchmark problems. It is shown empirically, that reasonably few interactions with the plant are neeed to generate control policies of high quality.

AI for Prosthetics Week 9 - 10: Unorthodox Approaches

 reinforcement-learning  ai-for-prosthetics  competition

We end the series by exploring possible unorthodox approaches for the competition. These are approaches that deviate from the popular policy gradient methods such as DDPG or PPO.

Notes from the ai.x 2018 Conference: Faster Reinforcement Learning via Transfer

 reinforcement-learning  conference

SK T-Brain hosted the ai.x Conference on September 6th at Seoul, South Korea. At this conference, John Schulman (OpenAI) spoke about faster reinforcement learning via transfer.

Pommerman 1: Understanding the Competition

 reinforcement-learning  competition

Pommerman is one of NIPS 2018 Competition tracks, where the participants seek to build agents to compete against other agents in a game of Bomberman. In this post, we simply explain the basics of Pommerman, leaving reinforcement learning to later posts.

AI for Prosthetics Week 6: General Techniques of RL

 reinforcement-learning  ai-for-prosthetics  competition

This week, we take a step back from the competition and study common techniques used in Reinforcement Learning.

AI for Prosthetics Week 5: Understanding the Reward

 reinforcement-learning  ai-for-prosthetics  competition

The goal of reinforcement learning is defined by the reward signal - to maximize the cumulative reward throughout an episode. In some ways, the reward is the most important aspect of the environment for the agent: even if it does not know about values of states or actions (like Evolutionary Strategies), if it can consistently get high return (cumulative reward), it is a great agent.

AI for Prosthetics Week 3-4: Understanding the Observation Space

 reinforcement-learning  ai-for-prosthetics  competition

The observation can be roughly divided into five components: the body parts, the joints, the muscles, the forces, and the center of mass. For each body part component, the agent observes its position, velocity, acceleration, rotation, rotational velocity, and rotational acceleration.

AI for Prosthetics Week 2: Understanding the Action Space

 reinforcement-learning  ai-for-prosthetics  competition

Last week, we saw how a valid action has 19 numbers, each between 0 and 1. The 19 numbers represented the amount of force to put to each muscle. I know barely anything about muscles, so I decided to manually go through all the muscles to understand the effects of each muscle...

AI for Prosthetics Week 1: Understanding the Challenge

 reinforcement-learning  ai-for-prosthetics  competition

The AI for Prosthetics challenge is one of NIPS 2018 Competition tracks. In this challenge, the participants seek to build an agent that can make a 3D model of human with prosthetics run. This challenge is a continuation of the Learning to Run challenge (shown below) that was part of NIPS 2017 Competition Track. The challenge was enhanced in three ways...

Jupyter Notebook extensions to enhance your efficiency

Jupyter Notebook is a great tool that allows you to integrate live code, equations, visualizations and narrative text into a document. It is used extensively in data science. However, for developers who have used IDEs with abundant features, the simplicity of Jupyter Notebook might be problematic.

Bias-variance Tradeoff in Reinforcement Learning

 reinforcement-learning

Bias-variance tradeoff is a familiar term to most people who learned machine learning. In the context of Machine Learning, bias and variance refers to the model: a model that underfits the data has high bias, whereas a model that overfits the data has high variance. In Reinforcement Learning, we consider another bias-variance tradeoff.

I learned DQNs with OpenAI competition

 reinforcement-learning  competition

On April, OpenAI held a two-month-long competition called the Retro Contest where participants had to develop an agent that can achieve perform well on unseen custom-made stages of Sonic the Hedgehog. The agents were limited to 100 million steps per stage and 12 hours of time on a VM with 6 E5-2690v3 cores, 56GB of RAM, and a single K80 GPU.

Effective Data: Partition

 machine-learning

To train a good model, you need lots of data. Luckily, over the last few decades, collecting data has become much easier. However, there is little value to data if you use it incorrectly. Even if you double or triple the dataset manually or through data augmentation, without proper partition of data, you will be left clueless on how helpful adding more data was.