Overview

The game is based on the game of bowling, playable by one player or two players alternating.

In all six variations, games last for 10 frames, or turns. At the start of each frame, the current player is given two chances to roll a bowling ball down an alley in an attempt to knock down as many of the ten bowling pins as possible. The bowler (on the left side of the screen) may move up and down his end of the alley to aim before releasing the ball. In four of the game’s six variations, the ball can be steered before it hits the pins. Knocking down every pin on the first shot is a strike, while knocking every pin down in both shots is a spare. The player’s score is determined by the number of pins knocked down in all 10 frames, as well as the number of strikes and spares acquired.

Description from Wikipedia

State of the Art

Human Starts

Result Method Type Score from
146.5 Human Human Massively Parallel Methods for Deep Reinforcement Learning
79.3 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
76.8 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
69.6 DDQN DQN Deep Reinforcement Learning with Double Q-learning
65.8 PERDDQN (prop) DQN Prioritized Experience Replay
65.7 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
58.0 PERDQN (rank) DQN Prioritized Experience Replay
56.5 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
53.95 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
52.0 PERDDQN (rank) DQN Prioritized Experience Replay
50.4 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
41.8 A3C LSTM PG Asynchronous Methods for Deep Learning
41.2 DQN2015 DQN Massively Parallel Methods for Deep Reinforcement Learning
39.4 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
36.2 A3C FF (1 day) PG Asynchronous Methods for Deep Learning
35.2 Random Random Massively Parallel Methods for Deep Reinforcement Learning
35.1 A3C FF (4 days) PG Asynchronous Methods for Deep Learning
30.2 ApeX DQN DQN Distributed Prioritized Experience Replay

No-op Starts

Result Method Type Score from
160.7 Human Human Dueling Network Architectures for Deep Reinforcement Learning
154.8 Human Human Human-level control through deep reinforcement learning
108.55 RUDDER Misc RUDDER: Return Decomposition for Delayed Rewards
102.1 DDQN+PopArt DQN Learning values across many orders of magnitude
97.0 DQfD Imitation Deep Q-Learning from Demonstrations
81.8 C51 Misc A Distributional Perspective on Reinforcement Learning
77.3 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
74.1 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
72.0 DuelingDQN DQN Noisy Networks for Exploration
71.0 DuelingPERDDQN DQN Deep Q-Learning from Demonstrations
71.0 NoisyNet-DQN DQN Noisy Networks for Exploration
70.5 DDQN DQN Deep Reinforcement Learning with Double Q-learning
68.1 DDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
68.0 NoisyNet-DuelingDQN DQN Noisy Networks for Exploration
65.5 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
62.6 PERDDQN DQN RUDDER: Return Decomposition for Delayed Rewards
62.6 PERDDQN (prop) DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
62.0 DQN DQN Noisy Networks for Exploration
54.01 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
50.4 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
50.4 DQN DQN RUDDER: Return Decomposition for Delayed Rewards
47.9 PER DQN Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
47.9 PERDDQN (rank) DQN Dueling Network Architectures for Deep Reinforcement Learning
46.7 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
43.9 Linear Misc Human-level control through deep reinforcement learning
42.4 DQN2015 DQN Human-level control through deep reinforcement learning
42.0 NoisyNet-A3C PG Noisy Networks for Exploration
37.0 A3C PG Noisy Networks for Exploration
36.4 Contingency Misc Human-level control through deep reinforcement learning
30.0 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
24.3 ACKTR PG Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
23.1 Random Random Human-level control through deep reinforcement learning
17.6 ApeX DQN DQN Distributed Prioritized Experience Replay
4 ApeX DQN DQN RUDDER: Return Decomposition for Delayed Rewards

Normal Starts

Result Method Type Score from
40.1 PPO PG Proximal Policy Optimization Algorithms
33.3 ACER PG Proximal Policy Optimization Algorithms
30.1 A2C PG Proximal Policy Optimization Algorithms