Overview

The player controls a green stick man. Using a joystick and a firing button that activates a laser-like weapon, the player navigates a simple maze filled with many robots, who fire lasers back at the player character. A player can be killed by being shot, by running into a robot or an exploding robot, coming into contact with the electrified walls of the maze itself, or by being touched by the player’s nemesis, Evil Otto.

The function of Evil Otto, represented by a bouncing smiley face, is to quicken the pace of the game. Otto is unusual, with regard to games of the period, in that there is no way to kill him. Otto can go through walls with impunity and is attracted to the player character. If robots remain in the maze Otto moves slowly, about half as fast as the humanoid, but he speeds up to match the humanoid’s speed once all the robots are killed. Evil Otto moves exactly the same speed as the player going left and right but he can move faster than the player going up and down; thus, no matter how close Otto is, the player can escape as long as they can avoid moving straight up or down.

The player advances by escaping from the maze through an opening in the far wall. Each robot destroyed is worth 50 points. Ideally, all the robots in the current maze have been destroyed before the player escapes, thus gaining the player a per-maze bonus (ten points per robot). The game has 65,536 rooms (256x256 grid), but due to limitations of the random number generation there are fewer than 1024 maze layouts (876 unique). It has only one controller, but two-player games can be accomplished by alternating at the joystick. The game is most difficult when the player enters a new maze, as there is only a short interval between entering the maze and all the robots in range firing at the player. For the beginner, this often means several deaths in rapid succession, as each death means starting a new maze layout.

As a player’s score increases, the colors of the enemy robots change, and the robots can have more bullets on the screen at the same time. Once they reach the limit of simultaneous onscreen bullets, they cannot fire again until one or more of their bullets detonates; the limit applies to the robots as a group, not as individuals.

Description from Wikipedia

State of the Art

Human Starts

Result Method Type Score from
55598.9 ApeX DQN DQN Distributed Prioritized Experience Replay
2237.5 Human Human Deep Reinforcement Learning with Double Q-learning
2178.6 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
1793.4 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
1433.4 A3C FF (1 day) PG Asynchronous Methods for Deep Learning
1165.6 PERDDQN (prop) DQN Prioritized Experience Replay
1011.1 DDQN DQN Deep Reinforcement Learning with Double Q-learning
1000.0 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
910.6 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
865.9 PERDDQN (rank) DQN Prioritized Experience Replay
862.2 A3C LSTM PG Asynchronous Methods for Deep Learning
817.9 A3C FF (4 days) PG Asynchronous Methods for Deep Learning
670.8 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
644.0 PERDQN (rank) DQN Prioritized Experience Replay
493.4 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
196.1 Random Random Deep Reinforcement Learning with Double Q-learning
-9001.0 DQN2015 DQN Asynchronous Methods for Deep Learning
-9001.0 GorilaDQN DQN Asynchronous Methods for Deep Learning

No-op Starts

Result Method Type Score from
57196.7 ApeX DQN DQN Distributed Prioritized Experience Replay
3409.0 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
2630.4 Human Human Dueling Network Architectures for Deep Reinforcement Learning
2545.6 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
1896.0 NoisyNet-DuelingDQN DQN Noisy Networks for Exploration
1645.0 C51 Misc A Distributional Perspective on Reinforcement Learning
1614.2 PERDDQN (prop) DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
1472.6 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
1421.8 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
1305.6 PER DQN Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
1305.6 PERDDQN (rank) DQN Dueling Network Architectures for Deep Reinforcement Learning
1235.0 NoisyNet-A3C PG Noisy Networks for Exploration
1225.4 DDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
1199.6 DDQN+PopArt DQN Learning values across many orders of magnitude
1122.0 DuelingDQN DQN Noisy Networks for Exploration
1022.0 A3C PG Noisy Networks for Exploration
927.2 ACKTR PG Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
905.0 NoisyNet-DQN DQN Noisy Networks for Exploration
837.3 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
634.0 DQN DQN Noisy Networks for Exploration
585.6 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
123.7 Random Random Dueling Network Architectures for Deep Reinforcement Learning