Atari Breakout Environment

Overview

Breakout begins with eight rows of bricks, with each two rows a different color. The color order from the bottom up is yellow, green, orange and red. Using a single ball, the player must knock down as many bricks as possible by using the walls and/or the paddle below to ricochet the ball against the bricks and eliminate them. If the player’s paddle misses the ball’s rebound, he or she will lose a turn. The player has three turns to try to clear two screens of bricks. Yellow bricks earn one point each, green bricks earn three points, orange bricks earn five points and the top-level red bricks score seven points each. The paddle shrinks to one-half its size after the ball has broken through the red row and hit the upper wall. Ball speed increases at specific intervals: after four hits, after twelve hits, and after making contact with the orange and red rows.

Description from Wikipedia

Performances of RL Agents

We list various reinforcement learning algorithms that were tested in this environment. These results are from RL Database. If this page was helpful, please consider giving a star!

Star

Human Starts

Result	Algorithm	Source
766.8	A3C LSTM	Asynchronous Methods for Deep Reinforcement Learning
681.9	A3C FF	Asynchronous Methods for Deep Reinforcement Learning
551.6	A3C FF 1 day	Asynchronous Methods for Deep Reinforcement Learning
548.7	Distributional DQN	Rainbow: Combining Improvements in Deep Reinforcement Learning
481.1	Prioritized DQN (rank)	Prioritized Experience Replay
411.6	DuDQN	Dueling Network Architectures for Deep Reinforcement Learning
379.5	Rainbow	Rainbow: Combining Improvements in Deep Reinforcement Learning
371.6	Prioritized DDQN (prop, tuned)	Prioritized Experience Replay
368.9	DDQN (tuned)	Deep Reinforcement Learning with Double Q-learning
354.6	PDD DQN	Dueling Network Architectures for Deep Reinforcement Learning
343.0	Prioritized DDQN (rank, tuned)	Prioritized Experience Replay
338.7	DDQN	Deep Reinforcement Learning with Double Q-learning
313.03	Gorila DQN	Massively Parallel Methods for Deep Reinforcement Learning
303.9	DQN	Massively Parallel Methods for Deep Reinforcement Learning
27.9	Human	Massively Parallel Methods for Deep Reinforcement Learning
1.6	Random	Massively Parallel Methods for Deep Reinforcement Learning

No-op Starts

Result	Algorithm	Source
787.34	IMPALA (deep)	IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
766	QR-DQN-0	Distributional Reinforcement Learning with Quantile Regression
748	C51	A Distributional Perspective on Reinforcement Learning
742	QR-DQN-1	Distributional Reinforcement Learning with Quantile Regression
735.7	ACKTR	Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
734	IQN	Implicit Quantile Networks for Distributional Reinforcement Learning
640.43	IMPALA (shallow)	IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
612.5	Distributional DQN	Rainbow: Combining Improvements in Deep Reinforcement Learning
581.6	A2C	Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
518.4	Reactor	The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
516	NoisyNet DQN	Noisy Networks for Exploration
514.8	Reactor	The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
509.5	Reactor ND	The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
496	A3C	Noisy Networks for Exploration
418.5	DDQN	A Distributional Perspective on Reinforcement Learning
417.5	Rainbow	Rainbow: Combining Improvements in Deep Reinforcement Learning
402.2	Gorila DQN	Massively Parallel Methods for Deep Reinforcement Learning
401.2	DQN	Human-level control through deep reinforcement learning
396	DQN	Noisy Networks for Exploration
385.5	DQN	A Distributional Perspective on Reinforcement Learning
375.0	DDQN	Deep Reinforcement Learning with Double Q-learning
374	NoisyNet A3C	Noisy Networks for Exploration
366.0	PDD DQN	Dueling Network Architectures for Deep Reinforcement Learning
345.3	DuDQN	Dueling Network Architectures for Deep Reinforcement Learning
263	NoisyNet DuDQN	Noisy Networks for Exploration
200	DuDQN	Noisy Networks for Exploration
35.67	IMPALA (deep, multitask)	IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
31.8	Human	Human-level control through deep reinforcement learning
30.5	Human	Dueling Network Architectures for Deep Reinforcement Learning
14.7	TRPO	Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
6.1	Contingency	Human-level control through deep reinforcement learning
5.2	Linear	Human-level control through deep reinforcement learning
1.7	Random	Human-level control through deep reinforcement learning

Normal Starts

Result	Algorithm	Source
456.4	ACER	Proximal Policy Optimization Algorithm
448.514	ACKTR	RL Baselines Zoo b76641e
384.865	A2C	RL Baselines Zoo b76641e
380	UCC-I	Trust Region Policy Optimization
303.0	A2C	Proximal Policy Optimization Algorithm
274.8	PPO	Proximal Policy Optimization Algorithm
228.594	PPO	RL Baselines Zoo b76641e
225	DQN2013 Best	Playing Atari with Deep Reinforcement Learning
191.165	DQN	RL Baselines Zoo b76641e
168	DQN2013	Playing Atari with Deep Reinforcement Learning
131.46	DQN	OpenAI Baselines cbd21ef
114.26	PPO	OpenAI Baselines cbd21ef
113.58	ACKTR	OpenAI Baselines cbd21ef
82.94	ACER	OpenAI Baselines cbd21ef
81.61	PPO (MPI)	OpenAI Baselines cbd21ef
59.72	A2C	OpenAI Baselines cbd21ef
52	HNeat Best	Playing Atari with Deep Reinforcement Learning
34.2	TRPO - vine	Trust Region Policy Optimization
31	Human	Playing Atari with Deep Reinforcement Learning
14.98	TRPO (MPI)	OpenAI Baselines cbd21ef
10.8	TRPO - single path	Trust Region Policy Optimization
6	Contingency	Playing Atari with Deep Reinforcement Learning
5.2	Sarsa	Playing Atari with Deep Reinforcement Learning
4	HNeat Pixel	Playing Atari with Deep Reinforcement Learning
1.2	Random	Playing Atari with Deep Reinforcement Learning

endtoend.ai