Overview

In the game, the player controls a scuba diver who must protect a treasure from an octopus at the top of the screen: The octopus tries to capture the treasure with its tentacles. Meanwhile, a great white shark tries to distract the diver by swimming back and forth toward the bottom of the screen.

The diver loses a life if he is captured by the shark or the octopus’s tentacles, or if the air meter runs out. The diver can refill his air meter by touching a long pole which extends from a boat that appears from time to time.

Description from Wikipedia

State of the Art

Human Starts

Result Method Type Score from
23829.9 ApeX DQN DQN Distributed Prioritized Experience Replay
13637.9 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
12093.7 A3C LSTM PG Asynchronous Methods for Deep Learning
11836.1 PERDDQN (prop) DQN Prioritized Experience Replay
11686.5 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
11382.3 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
11185.1 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
10497.6 PERDDQN (rank) DQN Prioritized Experience Replay
10476.1 A3C FF (4 days) PG Asynchronous Methods for Deep Learning
9238.5 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
8960.3 DDQN DQN Deep Reinforcement Learning with Double Q-learning
8738.5 PERDQN (rank) DQN Prioritized Experience Replay
7186.4 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
6796.0 Human Human Massively Parallel Methods for Deep Reinforcement Learning
6738.8 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
5614.0 A3C FF (1 day) PG Asynchronous Methods for Deep Learning
5439.9 DQN2015 DQN Massively Parallel Methods for Deep Reinforcement Learning
1747.8 Random Random Massively Parallel Methods for Deep Reinforcement Learning

No-op Starts

Result Method Type Score from
25783.3 ApeX DQN DQN Distributed Prioritized Experience Replay
15851.2 DDQN+PopArt DQN Learning values across many orders of magnitude
15572.5 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
13439.4 PERDDQN (prop) DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
13136.0 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
12983.6 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
12542.0 C51 Misc A Distributional Perspective on Reinforcement Learning
12270.5 PERDDQN (rank) DQN Dueling Network Architectures for Deep Reinforcement Learning
12211.0 NoisyNet-DuelingDQN DQN Noisy Networks for Exploration
11971.1 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
10616.0 DDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
9919.0 DuelingDQN DQN Noisy Networks for Exploration
8798.0 NoisyNet-A3C PG Noisy Networks for Exploration
8716.8 DuelingPERDDQN DQN Deep Q-Learning from Demonstrations
8332.4 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
8207.8 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
8181.0 NoisyNet-DQN DQN Noisy Networks for Exploration
8179.0 DQN DQN Noisy Networks for Exploration
8049.0 Human Human Dueling Network Architectures for Deep Reinforcement Learning
7257 DQN2015 DQN Human-level control through deep reinforcement learning
7168.0 A3C PG Noisy Networks for Exploration
6997.1 DDQN DQN Deep Reinforcement Learning with Double Q-learning
6182.16 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
5188.3 DQfD Imitation Deep Q-Learning from Demonstrations
4076 Human Human Human-level control through deep reinforcement learning
2500 Linear Misc Human-level control through deep reinforcement learning
2292 Random Random Human-level control through deep reinforcement learning
2247 Contingency Misc Human-level control through deep reinforcement learning

Normal Starts

Result Method Type Score from
8488.0 ACER PG Proximal Policy Optimization Algorithms
6254.9 PPO PG Proximal Policy Optimization Algorithms
5961.2 A2C PG Proximal Policy Optimization Algorithms