Overview

Tennis offers singles matches for one or two players; one player is colored pink, the other blue. The game has two user-selectable speed levels. When serving and returning shots, the tennis players automatically swing forehand or backhand as the situation demands, and all shots automatically clear the net and land in bounds.

The first player to win one six-game set is declared the winner of the match (if the set ends in a 6-6 tie, the set restarts from 0-0). This differs from professional tennis, in which player must win at least two out of three six-game sets.

Description from Wikipedia

State of the Art

Human Starts

Result Method Type Score from
23.0 ApeX DQN DQN Distributed Prioritized Experience Replay
22.6 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
11.1 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
4.4 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
-0.69 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
-2.0 PERDDQN (prop) DQN Prioritized Experience Replay
-2.1 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
-2.2 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
-2.3 DQN2015 DQN Massively Parallel Methods for Deep Reinforcement Learning
-2.3 PERDQN (rank) DQN Prioritized Experience Replay
-5.3 PERDDQN (rank) DQN Prioritized Experience Replay
-6.3 A3C FF (4 days) PG Asynchronous Methods for Deep Learning
-6.4 A3C LSTM PG Asynchronous Methods for Deep Learning
-6.7 Human Human Massively Parallel Methods for Deep Reinforcement Learning
-7.8 DDQN DQN Deep Reinforcement Learning with Double Q-learning
-10.2 A3C FF (1 day) PG Asynchronous Methods for Deep Learning
-13.2 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
-21.4 Random Random Massively Parallel Methods for Deep Reinforcement Learning

No-op Starts

Result Method Type Score from
23.9 ApeX DQN DQN Distributed Prioritized Experience Replay
23.6 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
23.1 C51 Misc A Distributional Perspective on Reinforcement Learning
12.2 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
12.1 DDQN+PopArt DQN Learning values across many orders of magnitude
10.87 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
8.0 DQN DQN Noisy Networks for Exploration
5.1 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
1.7 DDQN DQN Deep Reinforcement Learning with Double Q-learning
0 Contingency Misc Human-level control through deep reinforcement learning
0.0 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
0.0 PERDDQN (rank) DQN Dueling Network Architectures for Deep Reinforcement Learning
0.0 NoisyNet-DQN DQN Noisy Networks for Exploration
0.0 NoisyNet-A3C PG Noisy Networks for Exploration
0.0 DuelingDQN DQN Noisy Networks for Exploration
0.0 NoisyNet-DuelingDQN DQN Noisy Networks for Exploration
0.0 PERDDQN (prop) DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
0.0 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
-0.0 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
-0.1 Linear Misc Human-level control through deep reinforcement learning
-2.5 DQN2015 DQN Human-level control through deep reinforcement learning
-6.0 A3C PG Noisy Networks for Exploration
-8.9 Human Human Human-level control through deep reinforcement learning
-22.8 DDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
-23.8 Random Random Human-level control through deep reinforcement learning

Normal Starts

Result Method Type Score from
-14.8 PPO PG Proximal Policy Optimization Algorithms
-17.6 ACER PG Proximal Policy Optimization Algorithms
-22.2 A2C PG Proximal Policy Optimization Algorithms