Overview

The player controls the character (Pitfall Harry) through a maze-like jungle in an attempt to recover 32 treasures in a 20-minute time period. Along the way, players must maneuver through numerous hazards, including pits, quicksand, rolling logs, fire, rattlesnakes, scorpions, and crocodiles. Harry may jump over or otherwise avoid these obstacles by climbing, running, or swinging on vines. Treasure includes bags of money, gold and silver bars, and diamond rings, which range in value from 2000 to 5000 points in 1000 point increments. There are eight of each treasure type, with 32 in total. A perfect score of 114,000 is achieved by claiming all 32 treasures without losing any points. Points are deducted by either falling in a hole (100 points) or touching logs; point loss depends on how long contact is made with the log. Under the jungle there is a tunnel which Harry can access through ladders found at various points. Traveling though the tunnel moves forward three screens at a time, which is necessary in order to collect all the treasures within the time limit. However, the tunnels are filled with dead-ends blocked by brick walls, forcing the player to return to the surface at one of the ladders, and try to find a way around again, thus wasting time. The tunnels also contain scorpions. The player loses a life if Harry comes in contact with any obstacle (except logs) or falls into a tar pit, quicksand, waterhole, or mouth of a crocodile. The game ends when either all 32 treasures have been collected, all three lives have been lost, or the time has run out.

Description from Wikipedia

State of the Art

Human Starts

Result Method Type Score from
5998.9 Human Human Deep Reinforcement Learning with Double Q-learning
-14.8 PERDDQN (prop) DQN Prioritized Experience Replay
-37.6 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
-46.9 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
-78.5 A3C FF (4 days) PG Asynchronous Methods for Deep Learning
-113.2 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
-123.0 A3C FF (1 day) PG Asynchronous Methods for Deep Learning
-135.7 A3C LSTM PG Asynchronous Methods for Deep Learning
-154.4 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
-186.7 DDQN DQN Deep Reinforcement Learning with Double Q-learning
-193.7 PERDQN (rank) DQN Prioritized Experience Replay
-243.6 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
-273.3 ApeX DQN DQN Distributed Prioritized Experience Replay
-342.8 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
-348.8 Random Random Deep Reinforcement Learning with Double Q-learning
-427.0 PERDDQN (rank) DQN Prioritized Experience Replay
-9001.0 DQN2015 DQN Asynchronous Methods for Deep Learning
-9001.0 GorilaDQN DQN Asynchronous Methods for Deep Learning

No-op Starts

Result Method Type Score from
60258.9 YouTube Imitation Playing hard exploration games by watching YouTube
53407.6 YouTube (imitation only) Imitation Playing hard exploration games by watching YouTube
6463.7 Human Human Dueling Network Architectures for Deep Reinforcement Learning
57.3 DQfD Imitation Deep Q-Learning from Demonstrations
0.0 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
0.0 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
0.0 DuelingPERDDQN DQN Deep Q-Learning from Demonstrations
0.0 C51 Misc A Distributional Perspective on Reinforcement Learning
0.0 DQN DQN Noisy Networks for Exploration
0.0 NoisyNet-DQN DQN Noisy Networks for Exploration
0.0 A3C PG Noisy Networks for Exploration
0.0 NoisyNet-A3C PG Noisy Networks for Exploration
0.0 DuelingDQN DQN Noisy Networks for Exploration
0.0 NoisyNet-DuelingDQN DQN Noisy Networks for Exploration
0.0 PERDDQN (prop) DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
0.0 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
-0.6 ApeX DQN DQN Distributed Prioritized Experience Replay
-0.6 ApeX DQN Playing hard exploration games by watching YouTube
-1.1 ACKTR PG Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
-2.1 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
-2.6 DDQN+PopArt DQN Learning values across many orders of magnitude
-18.2 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
-29.9 DDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
-229.4 Random Random Dueling Network Architectures for Deep Reinforcement Learning
-286.1 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
-356.5 PER DQN Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
-356.5 PERDDQN (rank) DQN Dueling Network Architectures for Deep Reinforcement Learning

Normal Starts

Result Method Type Score from
-16.9 ACER PG Proximal Policy Optimization Algorithms
-32.9 PPO PG Proximal Policy Optimization Algorithms
-55.0 A2C PG Proximal Policy Optimization Algorithms