Overview

Gameplay is on a flat plane with a mountainous horizon featuring an erupting volcano, distant crescent moon, and various geometric solids (in vector outline) like pyramids and blocks. The player views the screen, which includes an overhead radar view, to find and destroy the rather slow tanks, or the faster-moving supertanks. Saucer-shaped UFOs and guided missiles occasionally appear for a bonus score. The saucers differ from the tanks in that they do not fire upon the player and do not appear on radar. The player can hide behind the solids or, once fired upon, maneuver in rapid turns to buy time with which to fire.

The geometric solid obstacles are indestructible, and can block the movement of a player’s tank. However, they are also useful as shields as they block enemy fire as well.

Description from Wikipedia

State of the Art

Human Starts

Result Method Type Score from
92275.0 ApeX DQN DQN Distributed Prioritized Experience Replay
52040.0 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
33030.0 Human Human Massively Parallel Methods for Deep Reinforcement Learning
32250.0 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
31320.0 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
30650.0 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
29100.0 PERDDQN (prop) DQN Prioritized Experience Replay
26985.0 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
25520.0 PERDDQN (rank) DQN Prioritized Experience Replay
24740.0 DDQN DQN Deep Reinforcement Learning with Double Q-learning
23750.0 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
22250.0 PERDQN (rank) DQN Prioritized Experience Replay
20760.0 A3C LSTM PG Asynchronous Methods for Deep Learning
19938.0 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
17560.0 DQN2015 DQN Massively Parallel Methods for Deep Reinforcement Learning
12950.0 A3C FF (4 days) PG Asynchronous Methods for Deep Learning
11340.0 A3C FF (1 day) PG Asynchronous Methods for Deep Learning
3560.0 Random Random Massively Parallel Methods for Deep Reinforcement Learning

No-op Starts

Result Method Type Score from
98895.0 ApeX DQN DQN Distributed Prioritized Experience Replay
62010.0 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
52262.0 NoisyNet-DuelingDQN DQN Noisy Networks for Exploration
41993.7 DuelingPERDDQN DQN Deep Q-Learning from Demonstrations
41708.2 DQfD Imitation Deep Q-Learning from Demonstrations
41145.0 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
40481.0 DuelingDQN DQN Noisy Networks for Exploration
38130.0 PERDDQN (prop) DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
37800 Human Human Human-level control through deep reinforcement learning
37187.5 Human Human Dueling Network Architectures for Deep Reinforcement Learning
37150.0 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
36786.0 NoisyNet-DQN DQN Noisy Networks for Exploration
35520.0 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
32050.0 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
31700.0 DDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
31530.0 PER DQN Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
31530.0 PERDDQN (rank) DQN Dueling Network Architectures for Deep Reinforcement Learning
29900.0 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
28981.0 DQN DQN Noisy Networks for Exploration
28742.0 C51 Misc A Distributional Perspective on Reinforcement Learning
26300 DQN2015 DQN Human-level control through deep reinforcement learning
25730.0 DDQN DQN Deep Reinforcement Learning with Double Q-learning
25266.66 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
17871.0 NoisyNet-A3C PG Noisy Networks for Exploration
16411.0 A3C PG Noisy Networks for Exploration
15820 Linear Misc Human-level control through deep reinforcement learning
8910.0 ACKTR PG Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
8220.0 DDQN+PopArt DQN Learning values across many orders of magnitude
2360 Random Random Human-level control through deep reinforcement learning
16.2 Contingency Misc Human-level control through deep reinforcement learning

Normal Starts

Result Method Type Score from
17366.7 PPO PG Proximal Policy Optimization Algorithms
8983.3 ACER PG Proximal Policy Optimization Algorithms
3080.0 A2C PG Proximal Policy Optimization Algorithms