Overview

The game is played using a single, diagonally mounted four-way joystick. The player controls Q*bert, who starts each game at the top of a pyramid made of 28 cubes, and moves by hopping diagonally from cube to cube. Landing on a cube causes it to change color, and changing every cube to the target color allows the player to progress to the next stage.

At the beginning, jumping on every cube once is enough to advance. In later stages, each cube must be hit twice to reach the target color. Other times, cubes change color every time Q*bert lands on them, instead of remaining on the target color once they reach it. Both elements are then combined in subsequent stages. Jumping off the pyramid results in the character’s death.

A square video game screenshot that is a digital representation of a multicolored pyramid of cubes in front of a black background. An orange spherical character, a red ball, and a purple coiled snake are on the cubes. Multicolored discs are adjacent to the left and right sides of the pyramid. Above the pyramid are statistics related to gameplay. The eponymous Q*bert hops diagonally down the pyramid to avoid Coily, who is pursuing him. The game tracks the player’s progress above the pyramid. The player is impeded by several enemies, introduced gradually to the game:

  • Coily – Coily first appears as a purple egg that bounces to the bottom of the pyramid and then transforms into a snake that chases after Q*bert.
  • Ugg and Wrongway – Two purple creatures that hop along the sides of the cubes in an Escheresque manner. Starting at either the bottom left or bottom right corner, they keep moving toward the top right or top left side of the pyramid respectively, and fall off the pyramid when they reach the end.
  • Slick and Sam – Two green creatures that descend down the pyramid and revert cubes whose color has already been changed.

A collision with purple enemies is fatal to the character, whereas the green enemies are removed from the board upon contact. Colored balls occasionally appear at the second row of cubes and bounce downward; contact with a red ball is lethal to Qbert, while contact with a green one immobilizes the on-screen enemies for a limited time. Multicolored floating discs on either side of the pyramid serve as an escape from danger, particularly Coily. When Qbert jumps on a disc, it transports him to the top of the pyramid. If Coily is in close pursuit of the character, he will jump after Q*bert and fall to his death, awarding bonus points. This causes all enemies and balls on the screen to disappear, though they start to return after a few seconds.

Points are awarded for each color change (25), defeating Coily with a flying disc (500), remaining discs at the end of a stage (at higher stages, 50 or 100) and catching green balls (100) or Slick and Sam (300 each). Bonus points are also awarded for completing a screen, starting at 1,000 for the first screen of Level 1 and increasing by 250 for each subsequent completion. Extra lives are granted for reaching certain scores, which are set by the machine operator.

Description from Wikipedia

State of the Art

Human Starts

Result Method Type Score from
380152.1 ApeX DQN DQN Distributed Prioritized Experience Replay
21307.5 A3C LSTM PG Asynchronous Methods for Deep Learning
18397.6 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
15148.8 A3C FF (4 days) PG Asynchronous Methods for Deep Learning
15035.9 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
14175.8 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
14063.0 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
13752.3 A3C FF (1 day) PG Asynchronous Methods for Deep Learning
12740.5 PERDQN (rank) DQN Prioritized Experience Replay
12085.0 Human Human Massively Parallel Methods for Deep Reinforcement Learning
11277.0 PERDDQN (prop) DQN Prioritized Experience Replay
11020.8 DDQN DQN Deep Reinforcement Learning with Double Q-learning
9944.0 PERDDQN (rank) DQN Prioritized Experience Replay
9271.5 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
9176.6 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
7089.83 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
4589.8 DQN2015 DQN Massively Parallel Methods for Deep Reinforcement Learning
271.8 Random Random Massively Parallel Methods for Deep Reinforcement Learning
183.0 Random Random Deep Reinforcement Learning with Double Q-learning

No-op Starts

Result Method Type Score from
302391.3 ApeX DQN DQN Distributed Prioritized Experience Replay
33817.5 RainbowDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
27121.0 NoisyNet-DuelingDQN DQN Noisy Networks for Exploration
23784.0 C51 Misc A Distributional Perspective on Reinforcement Learning
23151.5 ACKTR PG Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
21792.7 DQfD Imitation Deep Q-Learning from Demonstrations
20693.7 DuelingPERDDQN DQN Deep Q-Learning from Demonstrations
19819.0 DuelingDQN DQN Noisy Networks for Exploration
19220.3 DuelingDDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
18802.8 PERDDQN (prop) DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
18760.3 DuelingPERDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
18586.0 A3C PG Noisy Networks for Exploration
17896.0 NoisyNet-A3C PG Noisy Networks for Exploration
16956.0 DistributionalDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
16256.5 PER DQN Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
16256.5 PERDDQN (rank) DQN Dueling Network Architectures for Deep Reinforcement Learning
15545.0 NoisyNet-DQN DQN Noisy Networks for Exploration
15276.3 NoisyNetDQN DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
15088.5 DDQN DQN Dueling Network Architectures for Deep Reinforcement Learning
14875.0 DDQN DQN Deep Reinforcement Learning with Double Q-learning
13455 Human Human Human-level control through deep reinforcement learning
13117.3 DQN2015 DQN Dueling Network Architectures for Deep Reinforcement Learning
11241.0 DQN DQN Noisy Networks for Exploration
10815.55 GorilaDQN DQN Massively Parallel Methods for Deep Reinforcement Learning
10596 DQN2015 DQN Human-level control through deep reinforcement learning
5236.8 DDQN+PopArt DQN Learning values across many orders of magnitude
960.3 Contingency Misc Human-level control through deep reinforcement learning
613.5 Linear Misc Human-level control through deep reinforcement learning
163.9 Random Random Human-level control through deep reinforcement learning

Normal Starts

Result Method Type Score from
18900 Human Human Playing Atari with Deep Reinforcement Learning
15316.6 ACER PG Proximal Policy Optimization Algorithms
14293.3 PPO PG Proximal Policy Optimization Algorithms
10065.7 A2C PG Proximal Policy Optimization Algorithms
7732.5 TRPO (vine) PG Trust Region Policy Optimization
1973.5 TRPO (single path) PG Trust Region Policy Optimization
1952 DQN2013 DQN Playing Atari with Deep Reinforcement Learning
960 Contingency Misc Playing Atari with Deep Reinforcement Learning
614 Sarsa Misc Playing Atari with Deep Reinforcement Learning
157 Random Random Playing Atari with Deep Reinforcement Learning