Atari Environments

Overview

Atari 2600 is a video game console from Atari that was released in 1977. The game console included popular games such as Breakout, Ms. Pacman and Space Invaders. Since Deep Q-Networks were introduced by Mnih et al. in 2013, Atari 2600 has been the standard environment to test new Reinforcement Learning algorithms. Atari 2600 has been a challenging testbed due to its high-dimensional video input (size 210 x 160, frequency 60 Hz) and the discrepancy of tasks between games.

The Atari 2600 environments was originally provided through the Arcade Learning Environment (ALE). The environments have been wrapped by OpenAI Gym to create a more standardized interface. The OpenAI Gym provides 59 Atari 2600 games as environments.

State of the Art

Note: Most papers use 57 Atari 2600 games, and a couple of them are not supported by OpenAI Gym.

These are the published state-of-the-art results for Atari 2600 testbed. To test the robustness of the agent, most papers use one or both settings: the no-op starts and the human starts, both devised to provide a nondeterministic starting position. In No-op start setting, the agent selects the “do nothing” action for up to 30 times at the start of an episode. providing random starting positions to the agent. This originates from the DQN2015 paper by Mnih et al. (2015). In the human start setting, the agents start from one of the 100 starting points sampled from a human professional’s gameplay. The human starts setting originates from the GorilaDQN paper by Nair et al. (2015).

Median

One popular method of checking the agent’s overall performance is the median human-normalized score. You can read more about the choice of this metric in the Rainbow paper. For better comparison of algorithms, we only used results that were tested on majority of the games available.

No-op starts

Median Method Score from
434% Ape-X DQN1 Distributed Prioritized Experience Replay
331% UNREAL2 Distributed Prioritized Experience Replay
223% Rainbow DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
178% C51 A Distributional Perspective on Reinforcement Learning
172% NoisyNet-Dueling DDQN Noisy Networks for Exploration
164% Distributional DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
151% Dueling DDQN Dueling Network Architectures for Deep Reinforcement Learning
140% Prioritized DDQN Dueling Network Architectures for Deep Reinforcement Learning
132% Dueling DDQN Noisy Networks for Exploration
123% NoisyNet-DQN Noisy Networks for Exploration
118% NoisyNet-DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
117% DDQN Dueling Network Architectures for Deep Reinforcement Learning
96% Gorila DQN Distributed Prioritized Experience Replay
83% DQN3 Noisy Networks for Exploration
80% A3C Noisy Networks for Exploration
79% DQN3 A Distributional Perspective on Reinforcement Learning

1 Ape-X DQN used a lot more (x100) environment frames compared to other results. The training time is half the time of other DQN results.
2 Hyperparameters were tuned per game.
3 Only evaluated on 49 games.

Human starts

Median Method Score from
358% Ape-X DQN1 Distributed Prioritized Experience Replay
250% UNREAL2 Distributed Prioritized Experience Replay
153% Rainbow DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
128% Prioritized DDQN Dueling Network Architectures for Deep Reinforcement Learning
125% C51 Distributed Prioritized Experience Replay
125% Distributional DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
117% Dueling DDQN Dueling Network Architectures for Deep Reinforcement Learning
116% A3C Dueling Network Architectures for Deep Reinforcement Learning
110% DDQN Dueling Network Architectures for Deep Reinforcement Learning
102% Noisy DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
78% Gorila DQN Distributed Prioritized Experience Replay
68% DQN3 Rainbow: Combining Improvements in Deep Reinforcement Learning

1 Ape-X DQN used a lot more (x100) environment frames compared to other results. The training time is half the time of other DQN results.
2 Hyperparameters were tuned per game.
3 Only evaluated on 49 games.

Individual Environments

Although the metric above is a valuable way of comparing the general effectiveness of an algorithm, different algorithms have different strengths. Thus, we also included the state-of-the-art results for each game.

If you want to see how other methods performed in each Atari 2600 games, you can check the results of all methods by clicking the name of the game in the table below.

No-op Starts

Game Result Method Type Score from
Alien 40804.9 ApeX DQN DQN Distributed Prioritized Experience Replay
Amidar 8659.2 ApeX DQN DQN Distributed Prioritized Experience Replay
Assault 24559.4 ApeX DQN DQN Distributed Prioritized Experience Replay
Asterix 428200.3 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
Asteroids 155495.1 ApeX DQN DQN Distributed Prioritized Experience Replay
Atlantis 3433182.0 ACKTR PG Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
Bank Heist 1716.4 ApeX DQN DQN Distributed Prioritized Experience Replay
Battle Zone 98895.0 ApeX DQN DQN Distributed Prioritized Experience Replay
Beam Rider 63305.2 ApeX DQN DQN Distributed Prioritized Experience Replay
Bowling 160.7 Human Human Dueling Network Architectures for Deep Reinforcement Learning
Boxing 100.0 ApeX DQN DQN Distributed Prioritized Experience Replay
Breakout 800.9 ApeX DQN DQN Distributed Prioritized Experience Replay
Centipede 49065.8 DDQN+PopArt DQN Learning values across many orders of magnitude
Chopper Command 721851.0 ApeX DQN DQN Distributed Prioritized Experience Replay
Crazy Climber 320426.0 ApeX DQN DQN Distributed Prioritized Experience Replay
Demon Attack 274176.7 ACKTR PG Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
Double Dunk 23.5 ApeX DQN DQN Distributed Prioritized Experience Replay
Enduro 3454.0 C51 Misc A Distributional Perspective on Reinforcement Learning
Fishing Derby 57.0 NoisyNet-DuelingDQN DQN Noisy Networks for Exploration
Freeway 34.0 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
Frostbite 9590.5 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
Gopher 120500.9 ApeX DQN DQN Distributed Prioritized Experience Replay
Gravitar 3351.4 Human Human Dueling Network Architectures for Deep Reinforcement Learning
H.E.R.O. 105929.4 DQfD Imitation Deep Q-Learning from Demonstrations
Ice Hockey 33.0 ApeX DQN DQN Distributed Prioritized Experience Replay
James Bond 21322.5 ApeX DQN DQN Distributed Prioritized Experience Replay
Kangaroo 16200.0 PERDDQN (rank) DQN Dueling Network Architectures for Deep Reinforcement Learning
Krull 22849.0 NoisyNet-A3C PG Noisy Networks for Exploration
Kung-Fu Master 97829.5 ApeX DQN DQN Distributed Prioritized Experience Replay
Montezuma’s Revenge 41098.4 YouTube Imitation Playing hard exploration games by watching YouTube
Ms. Pacman 15693 Human Human Human-level control through deep reinforcement learning
Name This Game 25783.3 ApeX DQN DQN Distributed Prioritized Experience Replay
Pong 21.0 DDQN DQN Deep Reinforcement Learning with Double Q-learning
Private Eye 98763.2 YouTube Imitation Playing hard exploration games by watching YouTube
Q*Bert 302391.3 ApeX DQN DQN Distributed Prioritized Experience Replay
River Raid 63864.4 ApeX DQN DQN Distributed Prioritized Experience Replay
Road Runner 234352.0 NoisyNet-DuelingDQN DQN Noisy Networks for Exploration
Robotank 73.8 ApeX DQN DQN Distributed Prioritized Experience Replay
Seaquest 392952.3 ApeX DQN DQN Distributed Prioritized Experience Replay
Space Invaders 54681.0 ApeX DQN DQN Distributed Prioritized Experience Replay
Star Gunner 434342.5 ApeX DQN DQN Distributed Prioritized Experience Replay
Tennis 23.9 ApeX DQN DQN Distributed Prioritized Experience Replay
Time Pilot 87085.0 ApeX DQN DQN Distributed Prioritized Experience Replay
Tutankham 314.3 ACKTR PG Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
Up and Down 436665.8 ACKTR PG Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
Venture 1813.0 ApeX DQN DQN Distributed Prioritized Experience Replay
Video Pinball 949604.0 C51 Misc A Distributional Perspective on Reinforcement Learning
Wizard of Wor 46204.0 ApeX DQN DQN Distributed Prioritized Experience Replay
Zaxxon 42285.5 ApeX DQN DQN Distributed Prioritized Experience Replay
Berzerk 57196.7 ApeX DQN DQN Distributed Prioritized Experience Replay
Defender 411943.5 ApeX DQN DQN Distributed Prioritized Experience Replay
Phoenix 224491.1 ApeX DQN DQN Distributed Prioritized Experience Replay
Pit Fall 60258.9 YouTube Imitation Playing hard exploration games by watching YouTube
Skiing -4336.9 Human Human Dueling Network Architectures for Deep Reinforcement Learning
Solaris 12380.0 A3C PG Noisy Networks for Exploration
Surround 10.0 NoisyNet-DuelingDQN DQN Noisy Networks for Exploration
Yars Revenge 148594.8 ApeX DQN DQN Distributed Prioritized Experience Replay

Human Starts

Game Result Method Type Score from
Alien 17731.5 ApeX DQN DQN Distributed Prioritized Experience Replay
Amidar 1540.4 Human Human Massively Parallel Methods for Deep Reinforcement Learning
Assault 24404.6 ApeX DQN DQN Distributed Prioritized Experience Replay
Asterix 395599.5 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
Asteroids 117303.4 ApeX DQN DQN Distributed Prioritized Experience Replay
Atlantis 918714.5 ApeX DQN DQN Distributed Prioritized Experience Replay
Bank Heist 1200.8 ApeX DQN DQN Distributed Prioritized Experience Replay
Battle Zone 92275.0 ApeX DQN DQN Distributed Prioritized Experience Replay
Beam Rider 72233.7 ApeX DQN DQN Distributed Prioritized Experience Replay
Bowling 146.5 Human Human Massively Parallel Methods for Deep Reinforcement Learning
Boxing 80.9 ApeX DQN DQN Distributed Prioritized Experience Replay
Breakout 766.8 A3C LSTM PG Asynchronous Methods for Deep Learning
Centipede 10321.9 Human Human Massively Parallel Methods for Deep Reinforcement Learning
Chopper Command 576601.5 ApeX DQN DQN Distributed Prioritized Experience Replay
Crazy Climber 263953.5 ApeX DQN DQN Distributed Prioritized Experience Replay
Demon Attack 133002.1 ApeX DQN DQN Distributed Prioritized Experience Replay
Double Dunk 22.3 ApeX DQN DQN Distributed Prioritized Experience Replay
Enduro 2223.9 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
Fishing Derby 22.6 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
Freeway 29.1 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
Frostbite 6511.5 ApeX DQN DQN Distributed Prioritized Experience Replay
Gopher 121168.2 ApeX DQN DQN Distributed Prioritized Experience Replay
Gravitar 3116.0 Human Human Massively Parallel Methods for Deep Reinforcement Learning
H.E.R.O. 50496.8 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
Ice Hockey 24.0 ApeX DQN DQN Distributed Prioritized Experience Replay
James Bond 18992.3 ApeX DQN DQN Distributed Prioritized Experience Replay
Kangaroo 12185.0 PERDDQN (rank) DQN Dueling Network Architectures for Deep Reinforcement Learning
Krull 11209.5 PERDQN (rank) DQN Prioritized Experience Replay
Kung-Fu Master 72068.0 ApeX DQN DQN Distributed Prioritized Experience Replay
Montezuma’s Revenge 4182.0 Human Human Massively Parallel Methods for Deep Reinforcement Learning
Ms. Pacman 15375.0 Human Human Massively Parallel Methods for Deep Reinforcement Learning
Name This Game 23829.9 ApeX DQN DQN Distributed Prioritized Experience Replay
Pong 19.1 DDQN DQN Deep Reinforcement Learning with Double Q-learning
Private Eye 64169.1 Human Human Massively Parallel Methods for Deep Reinforcement Learning
Q*Bert 380152.1 ApeX DQN DQN Distributed Prioritized Experience Replay
River Raid 49982.8 ApeX DQN DQN Distributed Prioritized Experience Replay
Road Runner 127111.5 ApeX DQN DQN Distributed Prioritized Experience Replay
Robotank 68.5 ApeX DQN DQN Distributed Prioritized Experience Replay
Seaquest 377179.8 ApeX DQN DQN Distributed Prioritized Experience Replay
Space Invaders 50699.3 ApeX DQN DQN Distributed Prioritized Experience Replay
Star Gunner 432958.0 ApeX DQN DQN Distributed Prioritized Experience Replay
Tennis 23.0 ApeX DQN DQN Distributed Prioritized Experience Replay
Time Pilot 71543.0 ApeX DQN DQN Distributed Prioritized Experience Replay
Tutankham 156.3 A3C FF (4 days) PG Asynchronous Methods for Deep Learning
Up and Down 347912.2 ApeX DQN DQN Distributed Prioritized Experience Replay
Venture 1039.0 Human Human Massively Parallel Methods for Deep Reinforcement Learning
Video Pinball 873988.5 ApeX DQN DQN Distributed Prioritized Experience Replay
Wizard of Wor 46897.0 ApeX DQN DQN Distributed Prioritized Experience Replay
Zaxxon 37672.0 ApeX DQN DQN Distributed Prioritized Experience Replay
Berzerk 55598.9 ApeX DQN DQN Distributed Prioritized Experience Replay
Defender 399865.3 ApeX DQN DQN Distributed Prioritized Experience Replay
Phoenix 188788.5 ApeX DQN DQN Distributed Prioritized Experience Replay
Pit Fall 5998.9 Human Human Deep Reinforcement Learning with Double Q-learning
Skiing -3686.6 Human Human Deep Reinforcement Learning with Double Q-learning
Solaris 11032.6 Human Human Deep Reinforcement Learning with Double Q-learning
Surround 7.0 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
Yars Revenge 131701.1 ApeX DQN DQN Distributed Prioritized Experience Replay

Installation

Prerequisites

To install the Atari 2600 environment, you need the OpenAI Gym toolkit. Read this page to learn how to install OpenAI Gym.

Installation via pip

If you did a full install of OpenAI Gym, the Atari 2600 should already be installed. Otherwise, you can install the Atari 2600 environment with a single pip command:

1
pip3 install gym[atari]

Test Installation

You can run a simple random agent to make sure the Atari 2600 environment was correctly installed.

1
2
3
4
5
6
7
import gym
env = gym.make('Pong-v0')
done = False
while not done:
    _, _, done, _ = env.step(env.action_space.sample())
    env.render()
env.close()

Variants

In OpenAI Gym, each game has a few variants, distinguished by their suffixes. Through these variants, you can configure frame skipping and sticky actions. Frame skipping is a technique of using $k$-th frame. In other words, the agent only makes action every $k$ frames, and the same action is performed for $k$ frames. Sticky actions is a technique of setting some nonzero probability $p$ of action being repeated without agent’s control. This adds stochasticity to the deterministic Atari 2600 environments.

For example, there are six variants for the Pong environment.

Name Frame Skip $k$ Repeat action probability $p$
Pong-v0 2~41 0.25
Pong-v4 2~41 0
PongDeterministic-v0 4 2 0.25
PongDeterministic-v4 3 4 2 0
PongNoFrameskip-v0 1 0.25
PongNoFrameskip-v4 1 0

1 $k$ is chosen randomly at every step from values ${2, 3, 4}$.
2 For Space Invaders, the Deterministic variant $k=3$. This is because when $k=4$, the lasers are invisible because it frame skip coincides with the blinking frequency of lasers.
3 Deterministic-v4 is the configuration used to assess Deep Q-Networks.

For more details about frame skipping and sticky actions, check Sections 2 and 5 of the ALE whitepaper: Revisiting the Arcade Learning Environment.

Also, there are RAM environments such as Pong-ram-v0, where the observation is the RAM of the Atari machine instead of the 210 x 160 visual input. You can also add suffixes to RAM environments.