Overview

The player is represented by a small, “somewhat humanoid head” at the bottom of the screen, later depicted as a caped, elf-like character on the Atari 2600, Atari 5200 and Atari 7800 cartridge graphics (though described as being a garden gnome in the trivia section of the cell phone interpretation). The player moves the character about the bottom area of the screen with a trackball and fires laser shots at a centipede advancing from the top of the screen down through a field of mushrooms. Shooting any section of the centipede creates a mushroom; shooting one of the middle segments splits the centipede into two pieces at that point. Each piece then continues independently on its way down the board, with the first section of the rear piece becoming a new head. If the head is destroyed, the section behind it becomes the next head.

The centipede starts at the top of the screen, traveling either left or right. When it hits a mushroom or the edge of the screen, it drops one level and switches direction. Thus, more mushrooms on the screen cause the centipede to descend more rapidly. The player can destroy mushrooms by shooting them, but each takes four hits to destroy.

If the centipede reaches the bottom of the screen, it moves back and forth within the player area and one-segment “head” centipedes are periodically added. This continues until the player has eliminated both the original centipede and all heads. When all the centipede’s segments are destroyed, a new centipede forms at the top of the screen. Every time a centipede is eliminated, however, the next one is one segment shorter and is accompanied by one additional, fast-moving “head” centipede.

The player is also menaced by other creatures besides the centipedes. Fleas drop vertically, leaving additional mushrooms in their path; they appear when fewer than five mushrooms are in the player movement area, though the number required increases with level of difficulty. Spiders move across the player area in a zig-zag fashion and occasionally eat some of the mushrooms. Scorpions move horizontally across the screen and poison every mushroom they touch, but these never appear in the player movement region. A centipede touching a poisoned mushroom hurtles straight down toward the player area, then returns to normal behavior upon reaching it.

A player loses a life when hit by a centipede or another enemy, such as a spider or a flea, after which any poisoned or partially damaged mushrooms revert to normal. Points are awarded for each regenerated mushroom. A game ends if all lives are gone.

Description from Wikipedia

State of the Art

Human Starts

Result Method Type Score from
10321.9 Human Human Massively Parallel Methods for Deep Reinforcement Learning
7476.9 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
7160.9 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
6296.87 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
5711.6 ApeX DQN DQN Distributed Prioritized Experience Replay
5570.2 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
4881.0 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
4214.4 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
3973.9 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
3853.5 DDQN DQN Deep Reinforcement Learning with Double Q-learning
3773.1 DQN2015 DQN Massively Parallel Methods for Deep Reinforcement Learning
3755.8 A3C FF (4 days) PG Asynchronous Methods for Deep Learning
3489.1 PERDDQN (rank) DQN Prioritized Experience Replay
3421.9 PERDDQN (prop) DQN Prioritized Experience Replay
3306.5 A3C FF (1 day) PG Asynchronous Methods for Deep Learning
2959.4 PERDQN (rank) DQN Prioritized Experience Replay
1997.0 A3C LSTM PG Asynchronous Methods for Deep Learning
1925.5 Random Random Massively Parallel Methods for Deep Reinforcement Learning

No-op Starts

Result Method Type Score from
49065.8 DDQN+PopArt DQN Learning values across many orders of magnitude
12974.0 ApeX DQN DQN Distributed Prioritized Experience Replay
12017.0 Human Human Dueling Network Architectures for Deep Reinforcement Learning
11963 Human Human Human-level control through deep reinforcement learning
9646.0 C51 Misc A Distributional Perspective on Reinforcement Learning
9015.5 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
8803 Linear Misc Human-level control through deep reinforcement learning
8432.3 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
8309 DQN2015 DQN Human-level control through deep reinforcement learning
8282.0 NoisyNet-A3C PG Noisy Networks for Exploration
8167.3 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
7687.5 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
7596.0 NoisyNet-DuelingDQN DQN Noisy Networks for Exploration
7561.4 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
7125.28 ACKTR PG Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
6440.0 DQN DQN Noisy Networks for Exploration
5409.4 DDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
5350.0 A3C PG Noisy Networks for Exploration
5175.4 PERDDQN (prop) DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
4657.7 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
4647 Contingency Misc Human-level control through deep reinforcement learning
4463.2 PER DQN Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
4463.2 PERDDQN (rank) DQN Dueling Network Architectures for Deep Reinforcement Learning
4355.8 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
4269.0 NoisyNet-DQN DQN Noisy Networks for Exploration
4166.0 DuelingDQN DQN Noisy Networks for Exploration
4139.4 DDQN DQN Deep Reinforcement Learning with Double Q-learning
2091 Random Random Human-level control through deep reinforcement learning

Normal Starts

Result Method Type Score from
8904.8 ACER PG Proximal Policy Optimization Algorithms
4386.4 PPO PG Proximal Policy Optimization Algorithms
3496.5 A2C PG Proximal Policy Optimization Algorithms