This is a collection of presentation slides on Artificial Intelligence books and State-of-the-Art papers.
Reinforcement Learning: an Introduction by Sutton and Barto
- Chapter 1: Introduction [SlideShare] [Google Slides] [PDF]
- Chapter 2: Multi-armed Bandits [SlideShare] [Google Slides] [PDF]
- Chapter 3: Finite Markov Decision Processes [SlideShare] [Google Slides] [PDF]
- Chapter 4: Dynamic Programming [SlideShare] [Google Slides] [PDF]
- Chapter 5: Monte Carlo Methods [SlideShare] [Google Slides] [PDF]
- Chapter 6: Temporal-Difference Learning [SlideShare] [Google Slides] [PDF]
- Chapter 7: n-step Bootstrapping [SlideShare] [Google Slides] [PDF]
- Chapter 8: Planning and Learning with Tabular Methods [SlideShare] [Google Slides] [PDF]
- Chapter 10: On-policy Control with Approximation [SlideShare] [Google Slides] [PDF]
- Chapter 13: Policy Gradient Methods [SlideShare] [Google Slides] [PDF]
Learning Dexterous In-Hand Manipulation
S. Zhang, R. Sutton - August 2018
We use reinforcement learning (RL) to learn dexterous in-hand manipulation policies which can perform vision-based object reorientation on a physical Shadow Dexterous Hand. The training is performed in a simulated environment in which we randomize many of the physical properties of the system like friction coefficients and an object’s appearance. Our policies transfer to the physical robot despite being trained entirely in simulation. Our method does not rely on any human demonstrations, but many behaviors found in human manipulation emerge naturally, including finger gaiting, multi-finger coordination, and the controlled use of gravity. Our results were obtained using the same distributed RL system that was used to train OpenAI Five. We also include a video of our results: https://youtu.be/jwSbzNHGflM.
Learning Montezuma’s Revenge from a Single Demonstration
T. Salimans, R. Chen - July 2018
We’ve trained an agent to achieve a high score of 74,500 on Montezuma’s Revenge from a single human demonstration, better than any previously published result. Our algorithm is simple: the agent plays a sequence of games starting from carefully chosen states from the demonstration, and learns from them by optimizing the game score using PPO, the same reinforcement learning algorithm that underpins OpenAI Five.
A Deeper Look at Experience Replay
OpenAI - December 2017
Recently experience replay is widely used in various deep reinforcement learning (RL) algorithms, in this paper we rethink the utility of experience replay. It introduces a new hyper-parameter, the memory buffer size, which needs carefully tuning. However unfortunately the importance of this new hyper-parameter has been underestimated in the community for a long time. In this paper we did a systematic empirical study of experience replay under various function representations. We showcase that a large replay buffer can significantly hurt the performance. Moreover, we propose a simple O(1) method to remedy the negative influence of a large replay buffer. We showcase its utility in both simple grid world and challenging domains like Atari games.
Playing Atari with Deep Reinforcement Learning
Mnih et al. - December 2013
We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. We apply our method to seven Atari 2600 games from the Arcade Learning Environment, with no adjustment of the architecture or learning algorithm. We find that it outperforms all previous approaches on six of the games and surpasses a human expert on three of them.
Attention is All You Need
Vaswani et al. - June 2017
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.