Overview

The player controls Road Runner, who is chased by Wile E. Coyote. In order to escape, Road Runner runs endlessly to the left. While avoiding Wile E. Coyote, the player must pick up bird seeds on the street, avoid obstacles like cars, and get through mazes. Sometimes Wile E. Coyote will just run after the Road Runner, but he occasionally uses tools like rockets, roller skates, and pogo-sticks.

Description from Wikipedia

State of the Art

Human Starts

Result Method Type Score from
127111.5 ApeX DQN DQN Distributed Prioritized Experience Replay
73949.0 A3C LSTM PG Asynchronous Methods for Deep Learning
58549.0 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
57207.0 PERDQN (rank) DQN Prioritized Experience Replay
56990.0 PERDDQN (prop) DQN Prioritized Experience Replay
56086.0 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
54630.0 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
54261.0 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
52264.0 PERDDQN (rank) DQN Prioritized Experience Replay
43156.0 DDQN DQN Deep Reinforcement Learning with Double Q-learning
43079.8 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
35376.5 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
35215.0 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
34216.0 A3C FF (4 days) PG Asynchronous Methods for Deep Learning
31769.0 A3C FF (1 day) PG Asynchronous Methods for Deep Learning
9264.0 DQN2015 DQN Massively Parallel Methods for Deep Reinforcement Learning
6878.0 Human Human Massively Parallel Methods for Deep Reinforcement Learning
200.0 Random Random Massively Parallel Methods for Deep Reinforcement Learning

No-op Starts

Result Method Type Score from
234352.0 NoisyNet-DuelingDQN DQN Noisy Networks for Exploration
222234.5 ApeX DQN DQN Distributed Prioritized Experience Replay
69524.0 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
64051.0 DuelingDQN DQN Noisy Networks for Exploration
63366.0 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
62785.0 PERDDQN (prop) DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
62151.0 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
62041.0 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
57608.0 PER DQN Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
57608.0 PERDDQN (rank) DQN Dueling Network Architectures for Deep Reinforcement Learning
55839.0 C51 Misc A Distributional Perspective on Reinforcement Learning
53446.0 ACKTR PG Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
51688.4 DuelingPERDDQN DQN Deep Q-Learning from Demonstrations
51007.99 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
50199.6 DQfD Imitation Deep Q-Learning from Demonstrations
48377.0 DDQN DQN Deep Reinforcement Learning with Double Q-learning
47770.0 DDQN+PopArt DQN Learning values across many orders of magnitude
45993.0 NoisyNet-DQN DQN Noisy Networks for Exploration
45315.0 A3C PG Noisy Networks for Exploration
44127.0 DDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
41681.0 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
39544.0 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
37910.0 DQN DQN Noisy Networks for Exploration
30454.0 NoisyNet-A3C PG Noisy Networks for Exploration
18257 DQN2015 DQN Human-level control through deep reinforcement learning
7845 Human Human Human-level control through deep reinforcement learning
89.1 Contingency Misc Human-level control through deep reinforcement learning
67.7 Linear Misc Human-level control through deep reinforcement learning
11.5 Random Random Human-level control through deep reinforcement learning

Normal Starts

Result Method Type Score from
35466.0 ACER PG Proximal Policy Optimization Algorithms
32810.0 A2C PG Proximal Policy Optimization Algorithms
25076.0 PPO PG Proximal Policy Optimization Algorithms