Overview

Taking on the role of an explorer grave robbing Tutankhamun’s tomb while exploring dozens of rooms, the player is chased by creatures such as asps, vultures, parrots, bats, dragons, and even curses, all that kill the player on contact. The explorer can fight back by firing lasers at the creatures, but he can only cover the left and right directions. The player is also endowed with a single screen-clearing “flash bomb” per room or life. Finally, each room has warp zones that teleport the player around the room, which enemies cannot use.

To progress, the player collects keys open locked doors throughout the rooms, searching for the large exit door. Optional treasures can be picked-up for bonus points. Each room has a timer; when it reaches zero the explorer can no longer fire lasers, and once a room is cleared the remaining time is converted to bonus points.

Description from Wikipedia

State of the Art

Human Starts

Result Method Type Score from
156.3 A3C FF (4 days) PG Asynchronous Methods for Deep Learning
144.2 A3C LSTM PG Asynchronous Methods for Deep Learning
138.3 Human Human Massively Parallel Methods for Deep Reinforcement Learning
127.7 ApeX DQN DQN Distributed Prioritized Experience Replay
126.9 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
124.3 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
123.3 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
118.45 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
108.6 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
96.5 PERDQN (rank) DQN Prioritized Experience Replay
92.2 DDQN DQN Deep Reinforcement Learning with Double Q-learning
56.9 PERDDQN (rank) DQN Prioritized Experience Replay
48.0 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
45.6 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
33.6 PERDDQN (prop) DQN Prioritized Experience Replay
32.4 DQN2015 DQN Massively Parallel Methods for Deep Reinforcement Learning
26.1 A3C FF (1 day) PG Asynchronous Methods for Deep Learning
12.7 Random Random Massively Parallel Methods for Deep Reinforcement Learning

No-op Starts

Result Method Type Score from
314.3 ACKTR PG Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
280.0 C51 Misc A Distributional Perspective on Reinforcement Learning
280.0 DuelingDQN DQN Noisy Networks for Exploration
272.6 ApeX DQN DQN Distributed Prioritized Experience Replay
269.0 NoisyNet-DuelingDQN DQN Noisy Networks for Exploration
249.4 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
245.9 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
244.97 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
241.0 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
232.0 NoisyNet-DQN DQN Noisy Networks for Exploration
231.6 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
218.4 DDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
218.0 DQN DQN Noisy Networks for Exploration
213.0 A3C PG Noisy Networks for Exploration
211.4 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
204.6 PER DQN Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
204.6 PERDDQN (rank) DQN Dueling Network Architectures for Deep Reinforcement Learning
190.6 DDQN DQN Deep Reinforcement Learning with Double Q-learning
186.7 DQN2015 DQN Human-level control through deep reinforcement learning
183.9 DDQN+PopArt DQN Learning values across many orders of magnitude
167.6 Human Human Human-level control through deep reinforcement learning
164.0 NoisyNet-A3C PG Noisy Networks for Exploration
114.3 Linear Misc Human-level control through deep reinforcement learning
98.2 Contingency Misc Human-level control through deep reinforcement learning
87.2 PERDDQN (prop) DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
68.1 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
11.4 Random Random Human-level control through deep reinforcement learning

Normal Starts

Result Method Type Score from
280.8 ACER PG Proximal Policy Optimization Algorithms
254.4 PPO PG Proximal Policy Optimization Algorithms
206.8 A2C PG Proximal Policy Optimization Algorithms