Overview

Seaquest is an underwater shooter in which the player controls a submarine. Enemies include sharks and submarines. The player must ward off the enemies while trying to rescue divers swimming through the water. The sub can hold up to eight divers at a time. Each time the player resurfaces, all rescued divers are emptied out in exchange for points. To add to the challenge, the submarine has a limited amount of oxygen. The player must surface often in order to replenish the oxygen, but if the player resurfaces without any rescued divers, they will lose a life. If the player resurfaces with the maximum amount of divers, they will gain bonus points for the sub’s remaining oxygen. Each time the player surfaces, the game’s difficulty increases; enemies increase in number and speed. Eventually an enemy sub begins patrolling the surface, leaving the player without a safe haven.

Description from Wikipedia

State of the Art

Human Starts

Result Method Type Score from
377179.8 ApeX DQN DQN Distributed Prioritized Experience Replay
40425.8 Human Human Massively Parallel Methods for Deep Reinforcement Learning
39096.7 PERDDQN (prop) DQN Prioritized Experience Replay
37361.6 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
25463.7 PERDDQN (rank) DQN Prioritized Experience Replay
19176.0 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
14498.0 DDQN DQN Deep Reinforcement Learning with Double Q-learning
11848.8 PERDQN (rank) DQN Prioritized Experience Replay
10145.85 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
4216.7 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
3275.4 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
2793.9 DQN2015 DQN Massively Parallel Methods for Deep Reinforcement Learning
2355.4 A3C FF (4 days) PG Asynchronous Methods for Deep Learning
2353.1 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
2300.2 A3C FF (1 day) PG Asynchronous Methods for Deep Learning
1431.2 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
1326.1 A3C LSTM PG Asynchronous Methods for Deep Learning
215.5 Random Random Massively Parallel Methods for Deep Reinforcement Learning

No-op Starts

Result Method Type Score from
392952.3 ApeX DQN DQN Distributed Prioritized Experience Replay
266434.0 C51 Misc A Distributional Perspective on Reinforcement Learning
50254.2 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
44417.4 PERDDQN (prop) DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
42054.7 Human Human Dueling Network Architectures for Deep Reinforcement Learning
26357.8 PER DQN Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
26357.8 PERDDQN (rank) DQN Dueling Network Architectures for Deep Reinforcement Learning
20182 Human Human Human-level control through deep reinforcement learning
19595.0 DuelingDQN DQN Noisy Networks for Exploration
16754.0 NoisyNet-DuelingDQN DQN Noisy Networks for Exploration
16452.7 DDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
15898.9 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
13169.06 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
12361.6 DQfD Imitation Deep Q-Learning from Demonstrations
10932.3 DDQN+PopArt DQN Learning values across many orders of magnitude
7995.0 DDQN DQN Deep Reinforcement Learning with Double Q-learning
5860.6 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
5286 DQN2015 DQN Human-level control through deep reinforcement learning
4754.4 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
4163.0 DQN DQN Noisy Networks for Exploration
2495.4 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
2282.0 NoisyNet-DQN DQN Noisy Networks for Exploration
1862.5 DuelingPERDDQN DQN Deep Q-Learning from Demonstrations
1776.0 ACKTR PG Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
1744.0 A3C PG Noisy Networks for Exploration
943.0 NoisyNet-A3C PG Noisy Networks for Exploration
931.6 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
675.5 Contingency Misc Human-level control through deep reinforcement learning
664.8 Linear Misc Human-level control through deep reinforcement learning
68.4 Random Random Human-level control through deep reinforcement learning

Normal Starts

Result Method Type Score from
28010 Human Human Playing Atari with Deep Reinforcement Learning
1908.6 TRPO (single path) PG Trust Region Policy Optimization
1739.5 ACER PG Proximal Policy Optimization Algorithms
1714.3 A2C PG Proximal Policy Optimization Algorithms
1705 DQN2013 DQN Playing Atari with Deep Reinforcement Learning
1204.5 PPO PG Proximal Policy Optimization Algorithms
788.4 TRPO (vine) PG Trust Region Policy Optimization
723 Contingency Misc Playing Atari with Deep Reinforcement Learning
665 Sarsa Misc Playing Atari with Deep Reinforcement Learning
110 Random Random Playing Atari with Deep Reinforcement Learning