Atari Centipede Environment

Overview

The player is represented by a small, “somewhat humanoid head” at the bottom of the screen, later depicted as a caped, elf-like character on the Atari 2600, Atari 5200 and Atari 7800 cartridge graphics (though described as being a garden gnome in the trivia section of the cell phone interpretation). The player moves the character about the bottom area of the screen with a trackball and fires laser shots at a centipede advancing from the top of the screen down through a field of mushrooms. Shooting any section of the centipede creates a mushroom; shooting one of the middle segments splits the centipede into two pieces at that point. Each piece then continues independently on its way down the board, with the first section of the rear piece becoming a new head. If the head is destroyed, the section behind it becomes the next head.

The centipede starts at the top of the screen, traveling either left or right. When it hits a mushroom or the edge of the screen, it drops one level and switches direction. Thus, more mushrooms on the screen cause the centipede to descend more rapidly. The player can destroy mushrooms by shooting them, but each takes four hits to destroy.

If the centipede reaches the bottom of the screen, it moves back and forth within the player area and one-segment “head” centipedes are periodically added. This continues until the player has eliminated both the original centipede and all heads. When all the centipede’s segments are destroyed, a new centipede forms at the top of the screen. Every time a centipede is eliminated, however, the next one is one segment shorter and is accompanied by one additional, fast-moving “head” centipede.

The player is also menaced by other creatures besides the centipedes. Fleas drop vertically, leaving additional mushrooms in their path; they appear when fewer than five mushrooms are in the player movement area, though the number required increases with level of difficulty. Spiders move across the player area in a zig-zag fashion and occasionally eat some of the mushrooms. Scorpions move horizontally across the screen and poison every mushroom they touch, but these never appear in the player movement region. A centipede touching a poisoned mushroom hurtles straight down toward the player area, then returns to normal behavior upon reaching it.

A player loses a life when hit by a centipede or another enemy, such as a spider or a flea, after which any poisoned or partially damaged mushrooms revert to normal. Points are awarded for each regenerated mushroom. A game ends if all lives are gone.

Description from Wikipedia

Performances of RL Agents

We list various reinforcement learning algorithms that were tested in this environment. These results are from RL Database. If this page was helpful, please consider giving a star!

Star

Human Starts

Result	Algorithm	Source
10321.9	Human	Massively Parallel Methods for Deep Reinforcement Learning
7476.9	Distributional DQN	Rainbow: Combining Improvements in Deep Reinforcement Learning
7160.9	Rainbow	Rainbow: Combining Improvements in Deep Reinforcement Learning
6296.87	Gorila DQN	Massively Parallel Methods for Deep Reinforcement Learning
5570.2	PDD DQN	Dueling Network Architectures for Deep Reinforcement Learning
5166.6	DDQN	Deep Reinforcement Learning with Double Q-learning
4881.0	DuDQN	Dueling Network Architectures for Deep Reinforcement Learning
3853.5	DDQN (tuned)	Deep Reinforcement Learning with Double Q-learning
3773.1	DQN	Massively Parallel Methods for Deep Reinforcement Learning
3755.8	A3C FF	Asynchronous Methods for Deep Reinforcement Learning
3489.1	Prioritized DDQN (rank, tuned)	Prioritized Experience Replay
3421.9	Prioritized DDQN (prop, tuned)	Prioritized Experience Replay
3306.5	A3C FF 1 day	Asynchronous Methods for Deep Reinforcement Learning
2959.4	Prioritized DQN (rank)	Prioritized Experience Replay
1997.0	A3C LSTM	Asynchronous Methods for Deep Reinforcement Learning
1925.5	Random	Massively Parallel Methods for Deep Reinforcement Learning

No-op Starts

Result	Algorithm	Source
12447	QR-DQN-1	Distributional Reinforcement Learning with Quantile Regression
12017.0	Human	Dueling Network Architectures for Deep Reinforcement Learning
11963.2	Human	Human-level control through deep reinforcement learning
11561	IQN	Implicit Quantile Networks for Distributional Reinforcement Learning
11049.75	IMPALA (deep)	IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
9646	C51	A Distributional Perspective on Reinforcement Learning
9163	QR-DQN-0	Distributional Reinforcement Learning with Quantile Regression
9015.5	Distributional DQN	Rainbow: Combining Improvements in Deep Reinforcement Learning
8803	Linear	Human-level control through deep reinforcement learning
8432.3	Gorila DQN	Massively Parallel Methods for Deep Reinforcement Learning
8309	DQN	Human-level control through deep reinforcement learning
8282	NoisyNet A3C	Noisy Networks for Exploration
8167.3	Rainbow	Rainbow: Combining Improvements in Deep Reinforcement Learning
7687.5	PDD DQN	Dueling Network Architectures for Deep Reinforcement Learning
7596	NoisyNet DuDQN	Noisy Networks for Exploration
7561.4	DuDQN	Dueling Network Architectures for Deep Reinforcement Learning
7267.2	Reactor ND	The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
7125.28	ACKTR	Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
6440	DQN	Noisy Networks for Exploration
5528.13	IMPALA (shallow)	IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
5409.4	DDQN	A Distributional Perspective on Reinforcement Learning
5350	A3C	Noisy Networks for Exploration
4916.84	IMPALA (deep, multitask)	IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
4657.7	DQN	A Distributional Perspective on Reinforcement Learning
4647	Contingency	Human-level control through deep reinforcement learning
4269	NoisyNet DQN	Noisy Networks for Exploration
4166	DuDQN	Noisy Networks for Exploration
4139.4	DDQN	Deep Reinforcement Learning with Double Q-learning
3422.0	Reactor	The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
3402.8	Reactor	The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
2090.9	Random	Human-level control through deep reinforcement learning

Normal Starts

Result	Algorithm	Source
8904.8	ACER	Proximal Policy Optimization Algorithm
5268.1	DQN Ours	Deep Recurrent Q-Learning for Partially Observable MDPs
4386.4	PPO	Proximal Policy Optimization Algorithm
4319.2	DRQN	Deep Recurrent Q-Learning for Partially Observable MDPs
3653	DQN Ours	Deep Recurrent Q-Learning for Partially Observable MDPs
3534	DRQN	Deep Recurrent Q-Learning for Partially Observable MDPs
3496.5	A2C	Proximal Policy Optimization Algorithm

endtoend.ai