Overview

Viewed from a top-down perspective, the player flies a fighter jet over the River of No Return in a raid behind enemy lines. The player’s jet can only move left and right—it cannot maneuver up and down the screen—but it can accelerate and decelerate. The player’s jet crashes if it collides with the riverbank or an enemy craft, or if the jet runs out of fuel. Assuming fuel can be replenished, and if the player evades damage, gameplay is essentially unlimited.

The player scores points for shooting enemy tankers (30 pts), helicopters (60 pts), fuel depots (80 pts), jets (100 pts), and bridges (500 pts). The jet refuels when it flies over a fuel depot. A bridge marks the end of a game level. Non-Atari 2600 ports of the game add hot air balloons that are worth 60 points when shot as well as tanks along the sides of the river that shoot at the player’s jet.

Destroying bridges also serve as the game’s checkpoints. If the player crashes the plane they will start their next life at the last destroyed bridge.

Description from Wikipedia

State of the Art

Human Starts

Result Method Type Score from
49982.8 ApeX DQN DQN Distributed Prioritized Experience Replay
18184.4 PERDDQN (prop) DQN Prioritized Experience Replay
16569.4 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
16496.8 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
14382.2 Human Human Massively Parallel Methods for Deep Reinforcement Learning
12201.8 A3C FF (4 days) PG Asynchronous Methods for Deep Learning
11807.2 PERDDQN (rank) DQN Prioritized Experience Replay
10838.4 DDQN DQN Deep Reinforcement Learning with Double Q-learning
10205.5 PERDQN (rank) DQN Prioritized Experience Replay
10001.2 A3C FF (1 day) PG Asynchronous Methods for Deep Learning
6591.9 A3C LSTM PG Asynchronous Methods for Deep Learning
5310.27 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
4748.5 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
4065.3 DQN2015 DQN Massively Parallel Methods for Deep Reinforcement Learning
588.3 Random Random Massively Parallel Methods for Deep Reinforcement Learning
-588.3 Random Random Deep Reinforcement Learning with Double Q-learning

No-op Starts

Result Method Type Score from
63864.4 ApeX DQN DQN Distributed Prioritized Experience Replay
23134.0 NoisyNet-DuelingDQN DQN Noisy Networks for Exploration
21162.6 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
20607.6 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
18810.0 DuelingPERDDQN DQN Deep Q-Learning from Demonstrations
18735.4 DQfD Imitation Deep Q-Learning from Demonstrations
18405.0 DuelingDQN DQN Noisy Networks for Exploration
17762.8 ACKTR PG Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
17322.0 C51 Misc A Distributional Perspective on Reinforcement Learning
17118.0 Human Human Dueling Network Architectures for Deep Reinforcement Learning
14884.5 DDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
14522.3 PER DQN Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
14522.3 PERDDQN (rank) DQN Dueling Network Architectures for Deep Reinforcement Learning
13513 Human Human Human-level control through deep reinforcement learning
12530.8 DDQN+PopArt DQN Learning values across many orders of magnitude
12015.3 DDQN DQN Deep Reinforcement Learning with Double Q-learning
9425.0 NoisyNet-DQN DQN Noisy Networks for Exploration
8344.83 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
8316 DQN2015 DQN Human-level control through deep reinforcement learning
8135.0 A3C PG Noisy Networks for Exploration
7878.0 NoisyNet-A3C PG Noisy Networks for Exploration
7377.6 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
7241.0 DQN DQN Noisy Networks for Exploration
2650 Contingency Misc Human-level control through deep reinforcement learning
1904 Linear Misc Human-level control through deep reinforcement learning
1339 Random Random Human-level control through deep reinforcement learning
1338.5 Random Random Learning values across many orders of magnitude

Normal Starts

Result Method Type Score from
9125.1 ACER PG Proximal Policy Optimization Algorithms
8393.6 PPO PG Proximal Policy Optimization Algorithms
7653.5 A2C PG Proximal Policy Optimization Algorithms