Overview

Boxing is an Atari 2600 video game based on the sport of boxing. The game was designed by Activision programmer Bob Whitehead. Boxing shows a top-down view of two boxers, one white and one black. When close enough, a boxer can hit his opponent with a punch (executed by pressing the fire button on the Atari joystick). This causes his opponent to reel back slightly. Long punches score one point, while closer punches (power punches, from the manual) score two. There are no knockdowns or rounds. A match is completed either when one player lands 100 punches (a ‘knockout’) or two minutes have elapsed (a ‘decision’). In the case of a decision, the player with the most landed punches is the winner. Ties are possible.

While the gameplay is simple, there are subtleties, such as getting an opponent on the ‘ropes’ and ‘juggling’ him back and forth between alternate punches. Boxing was made available on Microsoft’s Game Room service for its Xbox 360 console and for Windows-based PCs on September 1, 2010.

Description from Wikipedia

State of the Art

Human Starts

Result Method Type Score from
80.9 ApeX DQN DQN Distributed Prioritized Experience Replay
79.2 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
77.3 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
74.2 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
73.5 DDQN DQN Deep Reinforcement Learning with Double Q-learning
72.3 PERDDQN (rank) DQN Prioritized Experience Replay
70.3 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
69.6 PERDQN (rank) DQN Prioritized Experience Replay
68.6 PERDDQN (prop) DQN Prioritized Experience Replay
66.3 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
62.1 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
59.8 A3C FF (4 days) PG Asynchronous Methods for Deep Learning
54.9 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
37.3 A3C LSTM PG Asynchronous Methods for Deep Learning
33.7 A3C FF (1 day) PG Asynchronous Methods for Deep Learning
25.8 DQN2015 DQN Massively Parallel Methods for Deep Reinforcement Learning
9.6 Human Human Massively Parallel Methods for Deep Reinforcement Learning
-1.5 Random Random Massively Parallel Methods for Deep Reinforcement Learning

No-op Starts

Result Method Type Score from
100.0 ApeX DQN DQN Distributed Prioritized Experience Replay
100.0 NoisyNet-A3C PG Noisy Networks for Exploration
100.0 NoisyNet-DuelingDQN DQN Noisy Networks for Exploration
99.6 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
99.4 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
99.3 DuelingPERDDQN DQN Deep Q-Learning from Demonstrations
99.3 DDQN+PopArt DQN Learning values across many orders of magnitude
99.1 DQfD Imitation Deep Q-Learning from Demonstrations
99.0 DuelingDQN DQN Noisy Networks for Exploration
98.9 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
98.8 PERDDQN (prop) DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
98.1 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
97.8 C51 Misc A Distributional Perspective on Reinforcement Learning
95.6 PER DQN Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
95.6 PERDDQN (rank) DQN Dueling Network Architectures for Deep Reinforcement Learning
94.88 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
91.6 DDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
91.0 A3C PG Noisy Networks for Exploration
89.0 NoisyNet-DQN DQN Noisy Networks for Exploration
88.0 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
87.0 DQN DQN Noisy Networks for Exploration
83.3 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
81.7 DDQN DQN Deep Reinforcement Learning with Double Q-learning
71.8 DQN2015 DQN Human-level control through deep reinforcement learning
44 Linear Misc Human-level control through deep reinforcement learning
12.1 Human Human Dueling Network Architectures for Deep Reinforcement Learning
9.8 Contingency Misc Human-level control through deep reinforcement learning
4.3 Human Human Human-level control through deep reinforcement learning
1.45 ACKTR PG Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
0.1 Random Random Human-level control through deep reinforcement learning

Normal Starts

Result Method Type Score from
98.9 ACER PG Proximal Policy Optimization Algorithms
94.6 PPO PG Proximal Policy Optimization Algorithms
17.7 A2C PG Proximal Policy Optimization Algorithms