Overview

he object of the game is to hit as many targets as possible without being shot down or running out of fuel—which can be replenished, paradoxically, by blowing up fuel drums.[19]

There are two fortresses to fly through, with an outer space segment between them. At the end of the second fortress is a boss in the form of the Zaxxon robot.

The player’s ship casts a shadow to indicate its height.[20] An altimeter is also displayed; in space there is nothing for the ship to cast a shadow on.[21] The walls at the entrance and exit of each fortress have openings that the ship must be at the right altitude to pass through. Within each fortress are additional walls that the ship’s shadow and altimeter aid in flying over successfully.

Description from Wikipedia

State of the Art

Human Starts

Result Method Type Score from
37672.0 ApeX DQN DQN Distributed Prioritized Experience Replay
24622.0 A3C FF (4 days) PG Asynchronous Methods for Deep Learning
23519.0 A3C LSTM PG Asynchronous Methods for Deep Learning
19658.0 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
15130.0 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
11320.0 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
10164.0 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
9501.0 PERDDQN (prop) DQN Prioritized Experience Replay
9474.0 PERDDQN (rank) DQN Prioritized Experience Replay
8593.0 DDQN DQN Deep Reinforcement Learning with Double Q-learning
8443.0 Human Human Massively Parallel Methods for Deep Reinforcement Learning
7650.5 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
6159.4 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
5901.0 PERDQN (rank) DQN Prioritized Experience Replay
4412.0 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
2659.0 A3C FF (1 day) PG Asynchronous Methods for Deep Learning
831.0 DQN2015 DQN Massively Parallel Methods for Deep Reinforcement Learning
475.0 Random Random Massively Parallel Methods for Deep Reinforcement Learning

No-op Starts

Result Method Type Score from
42285.5 ApeX DQN DQN Distributed Prioritized Experience Replay
22209.5 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
18347.5 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
17448.0 ACKTR PG Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
16544.0 A3C PG Noisy Networks for Exploration
14874.0 NoisyNet-DuelingDQN DQN Noisy Networks for Exploration
14402.0 DDQN+PopArt DQN Learning values across many orders of magnitude
13959.0 DuelingDQN DQN Noisy Networks for Exploration
13886.0 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
13490.0 PERDDQN (prop) DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
12944.0 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
10513.0 C51 Misc A Distributional Perspective on Reinforcement Learning
10469.0 PER DQN Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
10469.0 PERDDQN (rank) DQN Dueling Network Architectures for Deep Reinforcement Learning
10182.0 DDQN DQN Deep Reinforcement Learning with Double Q-learning
10163.0 DDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
9390.0 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
9173 Human Human Human-level control through deep reinforcement learning
7129.33 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
6920.0 NoisyNet-DQN DQN Noisy Networks for Exploration
5363.0 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
4977 DQN2015 DQN Human-level control through deep reinforcement learning
4806.0 DQN DQN Noisy Networks for Exploration
3365 Linear Misc Human-level control through deep reinforcement learning
1324.0 NoisyNet-A3C PG Noisy Networks for Exploration
32.5 Random Random Human-level control through deep reinforcement learning
21.4 Contingency Misc Human-level control through deep reinforcement learning

Normal Starts

Result Method Type Score from
5008.7 PPO PG Proximal Policy Optimization Algorithms
29.0 ACER PG Proximal Policy Optimization Algorithms
16.3 A2C PG Proximal Policy Optimization Algorithms