This is a collection of presentation slides on Artificial Intelligence books and State-of-the-Art papers.
Reinforcement Learning: an Introduction by Sutton and Barto
- Chapter 1: Introduction [SlideShare] [Google Slides] [PDF]
- Chapter 2: Multi-armed Bandits [SlideShare] [Google Slides] [PDF]
- Chapter 3: Finite Markov Decision Processes [SlideShare] [Google Slides] [PDF]
- Chapter 4: Dynamic Programming [SlideShare] [Google Slides] [PDF]
- Chapter 5: Monte Carlo Methods [SlideShare] [Google Slides] [PDF]
- Chapter 6: Temporal-Difference Learning [SlideShare] [Google Slides] [PDF]
- Chapter 7: n-step Bootstrapping [SlideShare] [Google Slides] [PDF]
- Chapter 8: Planning and Learning with Tabular Methods [SlideShare] [Google Slides] [PDF]
- Chapter 10: On-policy Control with Approximation [SlideShare] [Google Slides] [PDF]
- Chapter 13: Policy Gradient Methods [SlideShare] [Google Slides] [PDF]
Learning Dexterous In-Hand Manipulation
S. Zhang, R. Sutton - August 2018
We use reinforcement learning (RL) to learn dexterous in-hand manipulation policies which can perform vision-based object reorientation on a physical Shadow Dexterous Hand. The training is performed in a simulated environment in which we randomize many of the physical properties of the system like friction coefficients and an object’s appearance. Our policies transfer to the physical robot despite being trained entirely in simulation. Our method does not rely on any human demonstrations, but many behaviors found in human manipulation emerge naturally, including finger gaiting, multi-finger coordination, and the controlled use of gravity. Our results were obtained using the same distributed RL system that was used to train OpenAI Five. We also include a video of our results: https://youtu.be/jwSbzNHGflM.
Learning Montezuma’s Revenge from a Single Demonstration
T. Salimans, R. Chen - July 2018
We’ve trained an agent to achieve a high score of 74,500 on Montezuma’s Revenge from a single human demonstration, better than any previously published result. Our algorithm is simple: the agent plays a sequence of games starting from carefully chosen states from the demonstration, and learns from them by optimizing the game score using PPO, the same reinforcement learning algorithm that underpins OpenAI Five.
A Deeper Look at Experience Replay
OpenAI - December 2017
Recently experience replay is widely used in various deep reinforcement learning (RL) algorithms, in this paper we rethink the utility of experience replay. It introduces a new hyper-parameter, the memory buffer size, which needs carefully tuning. However unfortunately the importance of this new hyper-parameter has been underestimated in the community for a long time. In this paper we did a systematic empirical study of experience replay under various function representations. We showcase that a large replay buffer can significantly hurt the performance. Moreover, we propose a simple O(1) method to remedy the negative influence of a large replay buffer. We showcase its utility in both simple grid world and challenging domains like Atari games.
Playing Atari with Deep Reinforcement Learning
Mnih et al. - December 2013
We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. We apply our method to seven Atari 2600 games from the Arcade Learning Environment, with no adjustment of the architecture or learning algorithm. We find that it outperforms all previous approaches on six of the games and surpasses a human expert on three of them.