Overview

Double Dunk is a simulation of two-on-two, half-court basketball. Teams have two on-screen characters, a shorter “outside” man and a taller “inside” man. In a single-player game, the player controls the on-screen character closest to the ball, either the one holding the ball (on offense) or the one guarding the opponent with the ball (on defense). In two-player games, each player may control one of the two teams as in a one-player game, or both players may play on the same team against a computer-controlled opponent. At the start of each possession, both offense and defense select from a number of plays (such as the “pick and roll” on offense), then attempt to score or regain possession of the ball by intercepting or stealing it from the offense.

Description from Wikipedia

State of the Art

Human Starts

Result Method Type Score from
22.3 ApeX DQN DQN Distributed Prioritized Experience Replay
16.0 PERDDQN (rank) DQN Prioritized Experience Replay
2.7 PERDDQN (prop) DQN Prioritized Experience Replay
0.1 A3C FF (1 day) PG Asynchronous Methods for Deep Learning
0.1 A3C LSTM PG Asynchronous Methods for Deep Learning
-0.1 A3C FF (4 days) PG Asynchronous Methods for Deep Learning
-0.3 DDQN DQN Deep Reinforcement Learning with Double Q-learning
-0.6 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
-0.8 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
-1.0 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
-3.7 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
-5.3 PERDQN (rank) DQN Prioritized Experience Replay
-6.0 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
-10.7 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
-11.35 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
-14.4 Human Human Massively Parallel Methods for Deep Reinforcement Learning
-16.0 Random Random Massively Parallel Methods for Deep Reinforcement Learning
-21.6 DQN2015 DQN Massively Parallel Methods for Deep Reinforcement Learning

No-op Starts

Result Method Type Score from
23.5 ApeX DQN DQN Distributed Prioritized Experience Replay
18.5 PER DQN Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
18.5 PERDDQN (rank) DQN Dueling Network Architectures for Deep Reinforcement Learning
17.0 DuelingDQN DQN Noisy Networks for Exploration
4.8 PERDDQN (prop) DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
3.0 A3C PG Noisy Networks for Exploration
3.0 NoisyNet-A3C PG Noisy Networks for Exploration
2.5 C51 Misc A Distributional Perspective on Reinforcement Learning
1.0 NoisyNet-DQN DQN Noisy Networks for Exploration
1.0 NoisyNet-DuelingDQN DQN Noisy Networks for Exploration
0.1 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
-0.3 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
-0.54 ACKTR PG Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
-1.8 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
-3.8 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
-6.0 DQN DQN Noisy Networks for Exploration
-6.3 DDQN DQN Deep Reinforcement Learning with Double Q-learning
-6.6 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
-10.62 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
-11.5 DDQN+PopArt DQN Learning values across many orders of magnitude
-12.5 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
-13.1 Linear Misc Human-level control through deep reinforcement learning
-14.3 DuelingPERDDQN DQN Deep Q-Learning from Demonstrations
-15.5 Human Human Human-level control through deep reinforcement learning
-16 Contingency Misc Human-level control through deep reinforcement learning
-18.1 DQN2015 DQN Human-level control through deep reinforcement learning
-18.6 Random Random Human-level control through deep reinforcement learning
-20.4 DQfD Imitation Deep Q-Learning from Demonstrations

Normal Starts

Result Method Type Score from
-13.2 ACER PG Proximal Policy Optimization Algorithms
-14.9 PPO PG Proximal Policy Optimization Algorithms
-16.2 A2C PG Proximal Policy Optimization Algorithms