Overview

Bank Heist is a maze video game developed by 20th Century Fox for the Atari 2600.

Each level in Bank Heist is a maze-like city (similar to Pac-Man). The objective of the game is to rob as many banks as possible while avoiding the police. The player controls a car called the Getaway Car. The car has a limited amount of fuel, which can be refilled by changing cities. Robbing a bank will cause a cop car to appear, as well as another bank. Up to three cars can be present in a city at a time. Cars can be destroyed by dropping dynamite out the tail pipe of the Getaway Car (however, dynamite can also destroy the Getaway Car). The player starts out with four spare cars (lives). Lives are lost by running out of fuel, being hit by dynamite, or hitting a cop car. If the player can rob nine banks in one city, an extra car is earned.

The left and right difficulty switches alter how hard the game is. When the left difficulty switch is set to A, the cop cars are smarter in catching the Getaway Car; when it’s set to B, enemy cars move in a more set pattern. When the right difficulty switch is set to A, the banks appear in random spots; when the switch is set to B, the banks appear in preset locations.

Description from Wikipedia

State of the Art

Human Starts

Result Method Type Score from
1200.8 ApeX DQN DQN Distributed Prioritized Experience Replay
1129.3 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
1004.6 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
970.1 A3C FF (4 days) PG Asynchronous Methods for Deep Learning
955.0 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
946.0 A3C FF (1 day) PG Asynchronous Methods for Deep Learning
932.8 A3C LSTM PG Asynchronous Methods for Deep Learning
886.0 DDQN DQN Deep Reinforcement Learning with Double Q-learning
876.6 PERDDQN (rank) DQN Prioritized Experience Replay
835.6 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
826.0 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
823.7 PERDQN (rank) DQN Prioritized Experience Replay
816.8 PERDDQN (prop) DQN Prioritized Experience Replay
644.5 Human Human Massively Parallel Methods for Deep Reinforcement Learning
399.42 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
312.7 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
176.3 DQN2015 DQN Massively Parallel Methods for Deep Reinforcement Learning
21.7 Random Random Massively Parallel Methods for Deep Reinforcement Learning

No-op Starts

Result Method Type Score from
1716.4 ApeX DQN DQN Distributed Prioritized Experience Replay
1611.9 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
1503.1 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
1428.0 DuelingDQN DQN Noisy Networks for Exploration
1358.0 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
1323.0 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
1318.0 NoisyNet-DuelingDQN DQN Noisy Networks for Exploration
1296.0 A3C PG Noisy Networks for Exploration
1289.7 ACKTR PG Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
1280.2 DQfD Imitation Deep Q-Learning from Demonstrations
1240.8 DuelingPERDDQN DQN Deep Q-Learning from Demonstrations
1126.8 PERDDQN (prop) DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
1103.3 DDQN+PopArt DQN Learning values across many orders of magnitude
1068.0 NoisyNet-DQN DQN Noisy Networks for Exploration
1056.7 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
1054.6 PER DQN Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
1054.6 PERDDQN (rank) DQN Dueling Network Architectures for Deep Reinforcement Learning
1033.0 NoisyNet-A3C PG Noisy Networks for Exploration
1030.6 DDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
976.0 C51 Misc A Distributional Perspective on Reinforcement Learning
753.1 Human Human Dueling Network Architectures for Deep Reinforcement Learning
734.4 Human Human Human-level control through deep reinforcement learning
728.3 DDQN DQN Deep Reinforcement Learning with Double Q-learning
609.0 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
455.0 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
455.0 DQN DQN Noisy Networks for Exploration
429.7 DQN2015 DQN Human-level control through deep reinforcement learning
190.8 Linear Misc Human-level control through deep reinforcement learning
67.4 Contingency Misc Human-level control through deep reinforcement learning
14.2 Random Random Human-level control through deep reinforcement learning

Normal Starts

Result Method Type Score from
1280.6 PPO PG Proximal Policy Optimization Algorithms
1177.5 ACER PG Proximal Policy Optimization Algorithms
1095.3 A2C PG Proximal Policy Optimization Algorithms