Overview

Pong is a two-dimensional sports game that simulates table tennis. The player controls an in-game paddle by moving it vertically across the left or right side of the screen. They can compete against another player controlling a second paddle on the opposing side. Players use the paddles to hit a ball back and forth. The goal is for each player to reach eleven points before the opponent; points are earned when one fails to return the ball to the other.

Description from Wikipedia

State of the Art

Human Starts

Result Method Type Score from
19.1 DDQN DQN Deep Reinforcement Learning with Double Q-learning
19.0 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
18.9 PERDDQN (prop) DQN Prioritized Experience Replay
18.9 PERDDQN (rank) DQN Prioritized Experience Replay
18.9 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
18.8 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
18.7 ApeX DQN DQN Distributed Prioritized Experience Replay
18.7 PERDQN (rank) DQN Prioritized Experience Replay
18.4 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
18.0 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
18.0 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
16.71 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
16.2 DQN2015 DQN Massively Parallel Methods for Deep Reinforcement Learning
15.5 Human Human Massively Parallel Methods for Deep Reinforcement Learning
11.4 A3C FF (1 day) PG Asynchronous Methods for Deep Learning
10.7 A3C LSTM PG Asynchronous Methods for Deep Learning
5.6 A3C FF (4 days) PG Asynchronous Methods for Deep Learning
-18.0 Random Random Massively Parallel Methods for Deep Reinforcement Learning

No-op Starts

Result Method Type Score from
21.0 DDQN DQN Deep Reinforcement Learning with Double Q-learning
21.0 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
21.0 NoisyNet-DQN DQN Noisy Networks for Exploration
21.0 DuelingDQN DQN Noisy Networks for Exploration
21.0 NoisyNet-DuelingDQN DQN Noisy Networks for Exploration
21.0 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
20.9 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
20.9 ACKTR PG Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
20.9 ApeX DQN DQN Distributed Prioritized Experience Replay
20.9 C51 Misc A Distributional Perspective on Reinforcement Learning
20.9 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
20.8 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
20.7 PERDDQN (prop) DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
20.6 PER DQN Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
20.6 PERDDQN (rank) DQN Dueling Network Architectures for Deep Reinforcement Learning
20.6 DDQN+PopArt DQN Learning values across many orders of magnitude
20.0 DQN DQN Noisy Networks for Exploration
18.9 DQN2015 DQN Human-level control through deep reinforcement learning
18.3 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
16.7 DuelingPERDDQN DQN Deep Q-Learning from Demonstrations
14.6 Human Human Dueling Network Architectures for Deep Reinforcement Learning
12.0 NoisyNet-A3C PG Noisy Networks for Exploration
10.7 DQfD Imitation Deep Q-Learning from Demonstrations
9.3 Human Human Human-level control through deep reinforcement learning
7.0 A3C PG Noisy Networks for Exploration
-17.4 Contingency Misc Human-level control through deep reinforcement learning
-19 Linear Misc Human-level control through deep reinforcement learning
-20.7 Random Random Human-level control through deep reinforcement learning

Normal Starts

Result Method Type Score from
20.9 TRPO (single path) PG Trust Region Policy Optimization
20.9 TRPO (vine) PG Trust Region Policy Optimization
20.7 ACER PG Proximal Policy Optimization Algorithms
20.7 PPO PG Proximal Policy Optimization Algorithms
20 DQN2013 DQN Playing Atari with Deep Reinforcement Learning
19.7 A2C PG Proximal Policy Optimization Algorithms
-3 Human Human Playing Atari with Deep Reinforcement Learning
-17 Contingency Misc Playing Atari with Deep Reinforcement Learning
-19 Sarsa Misc Playing Atari with Deep Reinforcement Learning
-20.4 Random Random Playing Atari with Deep Reinforcement Learning