Overview

The gopher tunnels left and right and up to the surface. When he makes a hole to the surface he will attempt to steal a carrot. The farmer must hit the gopher to send him back underground or fill in the holes to prevent him from reaching the surface. If gopher has taken any of the three carrots, a pelican will occasionally fly overhead and drop a seed which, if the farmer catches it, he can plant it in the place of the missing carrot. The longer the game, the faster the gopher gets. The game ends when the gopher successfully removes all three carrots. There are two skill levels and is for one or two players, giving a total of four game variations.

Description from Wikipedia

State of the Art

Human Starts

Result Method Type Score from
121168.2 ApeX DQN DQN Distributed Prioritized Experience Replay
105148.4 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
72595.7 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
57783.8 PERDDQN (prop) DQN Prioritized Experience Replay
34858.8 PERDDQN (rank) DQN Prioritized Experience Replay
27778.3 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
20051.4 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
17478.2 PERDQN (rank) DQN Prioritized Experience Replay
17106.8 A3C LSTM PG Asynchronous Methods for Deep Learning
15253.0 DDQN DQN Deep Reinforcement Learning with Double Q-learning
13131.0 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
10022.8 A3C FF (4 days) PG Asynchronous Methods for Deep Learning
8442.8 A3C FF (1 day) PG Asynchronous Methods for Deep Learning
8190.4 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
4373.04 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
2731.8 DQN2015 DQN Massively Parallel Methods for Deep Reinforcement Learning
2311.0 Human Human Massively Parallel Methods for Deep Reinforcement Learning
250.0 Random Random Massively Parallel Methods for Deep Reinforcement Learning

No-op Starts

Result Method Type Score from
120500.9 ApeX DQN DQN Distributed Prioritized Experience Replay
104368.2 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
70354.6 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
56218.2 DDQN+PopArt DQN Learning values across many orders of magnitude
49097.4 PERDDQN (prop) DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
47730.8 ACKTR PG Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
38909.0 NoisyNet-DuelingDQN DQN Noisy Networks for Exploration
33641.0 C51 Misc A Distributional Perspective on Reinforcement Learning
32487.2 PER DQN Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
32487.2 PERDDQN (rank) DQN Dueling Network Architectures for Deep Reinforcement Learning
28841.0 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
27313.0 DuelingDQN DQN Noisy Networks for Exploration
15718.4 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
15107.9 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
14840.8 DDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
14574.0 NoisyNet-DQN DQN Noisy Networks for Exploration
12439.0 NoisyNet-A3C PG Noisy Networks for Exploration
12003.4 DuelingPERDDQN DQN Deep Q-Learning from Demonstrations
11825.0 DQN DQN Noisy Networks for Exploration
8777.4 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
8520 DQN2015 DQN Human-level control through deep reinforcement learning
8215.4 DDQN DQN Deep Reinforcement Learning with Double Q-learning
7992.0 A3C PG Noisy Networks for Exploration
7810.3 DQfD Imitation Deep Q-Learning from Demonstrations
5279.0 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
2412.5 Human Human Dueling Network Architectures for Deep Reinforcement Learning
2368 Contingency Misc Human-level control through deep reinforcement learning
2321 Human Human Human-level control through deep reinforcement learning
1288 Linear Misc Human-level control through deep reinforcement learning
257.6 Random Random Human-level control through deep reinforcement learning

Normal Starts

Result Method Type Score from
37802.3 ACER PG Proximal Policy Optimization Algorithms
2932.9 PPO PG Proximal Policy Optimization Algorithms
1500.9 A2C PG Proximal Policy Optimization Algorithms