Overview

Montezuma’s Revenge is an early example of the Metroidvania genre. The player controls a character called Panama Joe (a.k.a. Pedro), moving him from room to room in the labyrinthine underground pyramid of the 16th century Aztec temple of emperor Montezuma II, filled with enemies, obstacles, traps, and dangers. The objective is to score points by gathering jewels and killing enemies along the way. Panama Joe must find keys to open doors, collect and use equipment such as torches, swords, amulets, etc., and avoid or defeat the challenges in his path. Obstacles are laser gates, conveyor belts, disappearing floors and fire pits.

Movement is achieved by jumping, running, sliding down poles, and climbing chains and ladders. Enemies are skulls, snakes, and spiders. A further complication arises in the bottom-most floors of each pyramid, which must be played in total darkness unless a torch is found.

The pyramid is nine floors deep, not counting the topmost entry room that the player drops into at the start of each level, and has 99 rooms to explore. The goal is to reach the Treasure Chamber, whose entrance is in the center room of the lowest level. After jumping in here, the player has a short time to jump from one chain to another and pick up as many jewels as possible. However, jumping onto a fireman’s pole will immediately take the player to the next level; when time runs out, the player is automatically thrown onto the pole.

There are nine difficulty levels in all. Though the basic layout of the pyramid remains the same from one level to the next, small changes in details force the player to rethink strategy. These changes include:

  • Blocking or opening up certain paths (by adding/removing walls or ladders)
  • Adding enemies and obstacles
  • Rearrangement of items
  • More dark rooms and fewer torches (in level 9, the entire pyramid is dark)
  • Enemies that do not disappear after they kill Panama Joe (starting with level 5)
  • The player can reach only the left half of the pyramid in level 1, and only the right half in level 2. Starting with level 3, the entire pyramid is open for exploration.

Description from Wikipedia

State of the Art

Human Starts

Result Method Type Score from
4182.0 Human Human Massively Parallel Methods for Deep Reinforcement Learning
1079.0 ApeX DQN DQN Distributed Prioritized Experience Replay
154.0 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
130.0 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
84.0 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
67.0 A3C FF (4 days) PG Asynchronous Methods for Deep Learning
55.0 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
53.0 A3C FF (1 day) PG Asynchronous Methods for Deep Learning
51.0 PERDDQN (rank) DQN Prioritized Experience Replay
50.0 DQN2015 DQN Massively Parallel Methods for Deep Reinforcement Learning
47.0 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
44.0 PERDQN (rank) DQN Prioritized Experience Replay
42.0 DDQN DQN Deep Reinforcement Learning with Double Q-learning
41.0 A3C LSTM PG Asynchronous Methods for Deep Learning
25.0 Random Random Massively Parallel Methods for Deep Reinforcement Learning
24.0 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
22.0 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
13.0 PERDDQN (prop) DQN Prioritized Experience Replay

No-op Starts

Result Method Type Score from
41098.4 YouTube Imitation Playing hard exploration games by watching YouTube
35926.1 YouTube (imitation only) Imitation Playing hard exploration games by watching YouTube
4753.3 Human Human Dueling Network Architectures for Deep Reinforcement Learning
4659.7 DQfD Imitation Playing hard exploration games by watching YouTube
4638.4 DQfD Imitation Deep Q-Learning from Demonstrations
4367 Human Human Human-level control through deep reinforcement learning
2500.0 ApeX DQN DQN Distributed Prioritized Experience Replay
2500.0 ApeX DQN Playing hard exploration games by watching YouTube
384.0 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
367.0 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
259 Contingency Misc Human-level control through deep reinforcement learning
57.0 NoisyNet-DuelingDQN DQN Noisy Networks for Exploration
14.0 A3C PG Noisy Networks for Exploration
10.7 Linear Misc Human-level control through deep reinforcement learning
4.16 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
4.0 NoisyNet-A3C PG Noisy Networks for Exploration
3.0 NoisyNet-DQN DQN Noisy Networks for Exploration
2.0 DQN DQN Noisy Networks for Exploration
0.1 DuelingPERDDQN DQN Deep Q-Learning from Demonstrations
0 DQN2015 DQN Human-level control through deep reinforcement learning
0.0 DDQN DQN Deep Reinforcement Learning with Double Q-learning
0 Random Random Human-level control through deep reinforcement learning
0.0 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
0.0 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
0.0 C51 Misc A Distributional Perspective on Reinforcement Learning
0.0 PERDDQN (rank) DQN Dueling Network Architectures for Deep Reinforcement Learning
0.0 DuelingDQN DQN Noisy Networks for Exploration
0.0 DDQN+PopArt DQN Learning values across many orders of magnitude
0.0 PERDDQN (prop) DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
0.0 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning

Normal Starts

Result Method Type Score from
42.0 PPO PG Proximal Policy Optimization Algorithms
0.3 ACER PG Proximal Policy Optimization Algorithms
0.0 A2C PG Proximal Policy Optimization Algorithms