Overview

The gameplay of Ms. Pac-Man is very similar to that of the original Pac-Man. The player earns points by eating pellets and avoiding ghosts (contact with one causes Ms. Pac-Man to lose a life). Eating an energizer (or “power pellet”) causes the ghosts to turn blue, allowing them to be eaten for extra points. Bonus fruits can be eaten for increasing point values, twice per round. As the rounds increase, the speed increases, and energizers generally lessen the duration of the ghosts’ vulnerability, eventually stopping altogether.

Description from Wikipedia

State of the Art

Human Starts

Result Method Type Score from
15375.0 Human Human Massively Parallel Methods for Deep Reinforcement Learning
6135.4 ApeX DQN DQN Distributed Prioritized Experience Replay
2570.2 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
2250.6 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
2064.1 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
1865.9 PERDDQN (rank) DQN Prioritized Experience Replay
1824.6 PERDDQN (prop) DQN Prioritized Experience Replay
1263.05 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
1241.3 DDQN DQN Deep Reinforcement Learning with Double Q-learning
1092.3 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
1012.1 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
1007.8 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
964.7 PERDQN (rank) DQN Prioritized Experience Replay
850.7 A3C LSTM PG Asynchronous Methods for Deep Learning
763.5 DQN2015 DQN Massively Parallel Methods for Deep Reinforcement Learning
653.7 A3C FF (4 days) PG Asynchronous Methods for Deep Learning
594.4 A3C FF (1 day) PG Asynchronous Methods for Deep Learning
197.8 Random Random Massively Parallel Methods for Deep Reinforcement Learning

No-op Starts

Result Method Type Score from
15693 Human Human Human-level control through deep reinforcement learning
11255.2 ApeX DQN DQN Distributed Prioritized Experience Replay
6951.6 Human Human Dueling Network Architectures for Deep Reinforcement Learning
6518.7 PERDDQN (rank) DQN Dueling Network Architectures for Deep Reinforcement Learning
6283.5 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
5546.0 NoisyNet-DuelingDQN DQN Noisy Networks for Exploration
5380.4 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
4963.8 DDQN+PopArt DQN Learning values across many orders of magnitude
4751.2 PERDDQN (prop) DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
4695.7 DQfD Imitation Deep Q-Learning from Demonstrations
3769.2 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
3684.2 DuelingPERDDQN DQN Deep Q-Learning from Demonstrations
3650.0 DuelingDQN DQN Noisy Networks for Exploration
3415.0 C51 Misc A Distributional Perspective on Reinforcement Learning
3401.0 NoisyNet-A3C PG Noisy Networks for Exploration
3327.3 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
3233.5 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
3085.6 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
2722.0 NoisyNet-DQN DQN Noisy Networks for Exploration
2711.4 DDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
2674.0 DQN DQN Noisy Networks for Exploration
2501.6 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
2436.0 A3C PG Noisy Networks for Exploration
2311 DQN2015 DQN Human-level control through deep reinforcement learning
1692 Linear Misc Human-level control through deep reinforcement learning
1227 Contingency Misc Human-level control through deep reinforcement learning
321.0 DDQN DQN Deep Reinforcement Learning with Double Q-learning
307.3 Random Random Human-level control through deep reinforcement learning

Normal Starts

Result Method Type Score from
2718.5 ACER PG Proximal Policy Optimization Algorithms
2096.5 PPO PG Proximal Policy Optimization Algorithms
1626.9 A2C PG Proximal Policy Optimization Algorithms