Overview

Breakout begins with eight rows of bricks, with each two rows a different color. The color order from the bottom up is yellow, green, orange and red. Using a single ball, the player must knock down as many bricks as possible by using the walls and/or the paddle below to ricochet the ball against the bricks and eliminate them. If the player’s paddle misses the ball’s rebound, he or she will lose a turn. The player has three turns to try to clear two screens of bricks. Yellow bricks earn one point each, green bricks earn three points, orange bricks earn five points and the top-level red bricks score seven points each. The paddle shrinks to one-half its size after the ball has broken through the red row and hit the upper wall. Ball speed increases at specific intervals: after four hits, after twelve hits, and after making contact with the orange and red rows.

Description from Wikipedia

State of the Art

Human Starts

Result Method Type Score from
766.8 A3C LSTM PG Asynchronous Methods for Deep Learning
756.5 ApeX DQN DQN Distributed Prioritized Experience Replay
681.9 A3C FF (4 days) PG Asynchronous Methods for Deep Learning
551.6 A3C FF (1 day) PG Asynchronous Methods for Deep Learning
548.7 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
481.1 PERDQN (rank) DQN Prioritized Experience Replay
423.3 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
411.6 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
379.5 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
371.6 PERDDQN (prop) DQN Prioritized Experience Replay
368.9 DDQN DQN Deep Reinforcement Learning with Double Q-learning
354.6 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
354.5 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
343.0 PERDDQN (rank) DQN Prioritized Experience Replay
313.03 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
303.9 DQN2015 DQN Massively Parallel Methods for Deep Reinforcement Learning
27.9 Human Human Massively Parallel Methods for Deep Reinforcement Learning
1.6 Random Random Massively Parallel Methods for Deep Reinforcement Learning

No-op Starts

Result Method Type Score from
800.9 ApeX DQN DQN Distributed Prioritized Experience Replay
748.0 C51 Misc A Distributional Perspective on Reinforcement Learning
735.7 ACKTR PG Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
612.5 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
516.0 NoisyNet-DQN DQN Noisy Networks for Exploration
496.0 A3C PG Noisy Networks for Exploration
459.1 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
418.5 DDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
417.5 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
402.2 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
401.2 DQN2015 DQN Human-level control through deep reinforcement learning
396.0 DQN DQN Noisy Networks for Exploration
385.5 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
381.5 PERDDQN (prop) DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
375.0 DDQN DQN Deep Reinforcement Learning with Double Q-learning
374.0 NoisyNet-A3C PG Noisy Networks for Exploration
373.9 PER DQN Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
373.9 PERDDQN (rank) DQN Dueling Network Architectures for Deep Reinforcement Learning
366.0 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
345.3 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
344.1 DDQN+PopArt DQN Learning values across many orders of magnitude
308.1 DQfD Imitation Deep Q-Learning from Demonstrations
275.0 DuelingPERDDQN DQN Deep Q-Learning from Demonstrations
263.0 NoisyNet-DuelingDQN DQN Noisy Networks for Exploration
200.0 DuelingDQN DQN Noisy Networks for Exploration
31.8 Human Human Human-level control through deep reinforcement learning
30.5 Human Human Dueling Network Architectures for Deep Reinforcement Learning
6.1 Contingency Misc Human-level control through deep reinforcement learning
5.2 Linear Misc Human-level control through deep reinforcement learning
1.7 Random Random Human-level control through deep reinforcement learning

Normal Starts

Result Method Type Score from
456.4 ACER PG Proximal Policy Optimization Algorithms
303.0 A2C PG Proximal Policy Optimization Algorithms
274.8 PPO PG Proximal Policy Optimization Algorithms
168 DQN2013 DQN Playing Atari with Deep Reinforcement Learning
34.2 TRPO (vine) PG Trust Region Policy Optimization
31 Human Human Playing Atari with Deep Reinforcement Learning
10.8 TRPO (single path) PG Trust Region Policy Optimization
6 Contingency Misc Playing Atari with Deep Reinforcement Learning
5.2 Sarsa Misc Playing Atari with Deep Reinforcement Learning
1.2 Random Random Playing Atari with Deep Reinforcement Learning