Overview

The player remotely controls a robot tank in 2019. The mission is to locate enemy rebel tanks rampaging across the countryside with radar, then destroy them with a cannon to stop them from reaching downtown Santa Clara, California, United States. The enemy is organized into squadrons of 12 tanks each. By defeating an enemy squadron, the player earns an additional reserve tank to the initial three, to a maximum of 12. The game ends when all of a player’s tanks are destroyed.

As the player’s tank is damaged, firepower and/or visual display capabilities are irreparably worsened. Enough damage will eventually destroy a tank. Combat can take place at any time of day or night (displayed on-screen), possibly with rain, snow, or fog (announced in a weather report each morning), which adds additional challenge in tracking enemy combatants by radar alone.

Description from Wikipedia

State of the Art

Human Starts

Result Method Type Score from
68.5 ApeX DQN DQN Distributed Prioritized Experience Replay
62.0 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
61.78 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
59.1 DDQN DQN Deep Reinforcement Learning with Double Q-learning
58.5 DQN2015 DQN Massively Parallel Methods for Deep Reinforcement Learning
56.2 PERDDQN (rank) DQN Prioritized Experience Replay
55.4 PERDDQN (prop) DQN Prioritized Experience Replay
55.2 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
51.3 PERDQN (rank) DQN Prioritized Experience Replay
50.9 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
49.8 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
32.8 A3C FF (4 days) PG Asynchronous Methods for Deep Learning
24.7 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
8.9 Human Human Massively Parallel Methods for Deep Reinforcement Learning
2.6 A3C LSTM PG Asynchronous Methods for Deep Learning
2.4 Random Random Massively Parallel Methods for Deep Reinforcement Learning
2.3 A3C FF (1 day) PG Asynchronous Methods for Deep Learning

No-op Starts

Result Method Type Score from
73.8 ApeX DQN DQN Distributed Prioritized Experience Replay
65.3 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
65.1 DDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
64.3 DDQN+PopArt DQN Learning values across many orders of magnitude
64.0 NoisyNet-DuelingDQN DQN Noisy Networks for Exploration
63.9 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
63.0 DuelingDQN DQN Noisy Networks for Exploration
62.6 PER DQN Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
62.6 PERDDQN (rank) DQN Dueling Network Architectures for Deep Reinforcement Learning
61.4 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
58.6 PERDDQN (prop) DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
55.0 DQN DQN Noisy Networks for Exploration
54.2 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
53.5 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
52.3 C51 Misc A Distributional Perspective on Reinforcement Learning
51.6 DQN2015 DQN Human-level control through deep reinforcement learning
51.0 NoisyNet-DQN DQN Noisy Networks for Exploration
46.7 DDQN DQN Deep Reinforcement Learning with Double Q-learning
36.43 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
36.0 NoisyNet-A3C PG Noisy Networks for Exploration
28.7 Linear Misc Human-level control through deep reinforcement learning
27.5 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
16.5 ACKTR PG Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
12.4 Contingency Misc Human-level control through deep reinforcement learning
11.9 Human Human Human-level control through deep reinforcement learning
6.0 A3C PG Noisy Networks for Exploration
2.2 Random Random Human-level control through deep reinforcement learning

Normal Starts

Result Method Type Score from
5.5 PPO PG Proximal Policy Optimization Algorithms
2.5 ACER PG Proximal Policy Optimization Algorithms
2.2 A2C PG Proximal Policy Optimization Algorithms