Overview

Ice Hockey is a game of two-on-two ice hockey. One player on each team is the goalie, and the other plays offensive (although, the goalie is not confined to the goal). As in the real sport, the object of the game is to take control of the puck and shoot it into the opposing goal to score points. When the puck is in player control, it moves left and right along the blade of the hockey stick. The puck can be shot at any of 32 angles, depending on the position of the puck when it’s shot.

Human players take control of the skater in control of (or closest to) the puck. The puck can be stolen from its holder; shots can also be blocked by the blade of the hockey stick.

Description from Wikipedia

State of the Art

Human Starts

Result Method Type Score from
24.0 ApeX DQN DQN Distributed Prioritized Experience Replay
0.5 Human Human Massively Parallel Methods for Deep Reinforcement Learning
0.5 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
-0.1 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
-0.2 PERDDQN (rank) DQN Prioritized Experience Replay
-0.7 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
-1.0 PERDDQN (prop) DQN Prioritized Experience Replay
-1.3 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
-1.6 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
-1.7 A3C LSTM PG Asynchronous Methods for Deep Learning
-1.72 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
-2.4 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
-2.5 DDQN DQN Deep Reinforcement Learning with Double Q-learning
-2.8 A3C FF (4 days) PG Asynchronous Methods for Deep Learning
-3.8 DQN2015 DQN Massively Parallel Methods for Deep Reinforcement Learning
-3.8 PERDQN (rank) DQN Prioritized Experience Replay
-4.7 A3C FF (1 day) PG Asynchronous Methods for Deep Learning
-9.7 Random Random Massively Parallel Methods for Deep Reinforcement Learning

No-op Starts

Result Method Type Score from
33.0 ApeX DQN DQN Distributed Prioritized Experience Replay
3.0 NoisyNet-DuelingDQN DQN Noisy Networks for Exploration
1.3 PER DQN Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
1.3 PERDDQN (rank) DQN Dueling Network Architectures for Deep Reinforcement Learning
1.3 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
1.1 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
0.9 Human Human Human-level control through deep reinforcement learning
0.5 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
0.3 PERDDQN (prop) DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
-0.0 DuelingDQN DQN Noisy Networks for Exploration
-0.4 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
-0.61 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
-1.6 DQN2015 DQN Human-level control through deep reinforcement learning
-2.0 DuelingPERDDQN DQN Deep Q-Learning from Demonstrations
-2.0 DQN DQN Noisy Networks for Exploration
-2.0 A3C PG Noisy Networks for Exploration
-2.1 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
-2.4 DDQN DQN Deep Reinforcement Learning with Double Q-learning
-3.0 NoisyNet-DQN DQN Noisy Networks for Exploration
-3.0 NoisyNet-A3C PG Noisy Networks for Exploration
-3.2 Contingency Misc Human-level control through deep reinforcement learning
-3.5 C51 Misc A Distributional Perspective on Reinforcement Learning
-4.1 DDQN+PopArt DQN Learning values across many orders of magnitude
-4.2 ACKTR PG Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
-9.5 Linear Misc Human-level control through deep reinforcement learning
-9.6 DQfD Imitation Deep Q-Learning from Demonstrations
-11.2 Random Random Human-level control through deep reinforcement learning

Normal Starts

Result Method Type Score from
-4.2 PPO PG Proximal Policy Optimization Algorithms
-5.9 ACER PG Proximal Policy Optimization Algorithms
-6.4 A2C PG Proximal Policy Optimization Algorithms