Atari Beam Rider Environment

Overview

Beamrider takes place above Earth’s atmosphere, where a large alien shield called the Restrictor Shield surrounds the Earth. The player’s objective is to clear the Shield’s 99 sectors of alien craft while piloting the Beamrider ship. The Beamrider is equipped with a short-range laser lariat and a limited supply of torpedoes. The player is given three at the start of each sector.

To clear a sector, fifteen enemy ships must be destroyed. A “Sentinel ship” will then appear, which can be destroyed using a torpedo (if any remain) for bonus points. Some enemy ships can only be destroyed with torpedoes, and some must simply be dodged. Occasionally during a sector, “Yellow Rejuvenators” (extra lives) appear. They can be picked up for an extra ship, but if they are shot they will transform into ship-damaging debris.

Description from Wikipedia

Performances of RL Agents

We list various reinforcement learning algorithms that were tested in this environment. These results are from RL Database. If this page was helpful, please consider giving a star!

Star

Human Starts

Result Algorithm Source
37412.2 PDD DQN Dueling Network Architectures for Deep Reinforcement Learning
31181.3 Prioritized DDQN (rank, tuned) Prioritized Experience Replay
26172.7 Prioritized DDQN (prop, tuned) Prioritized Experience Replay
24622.2 A3C LSTM Asynchronous Methods for Deep Reinforcement Learning
22707.9 A3C FF Asynchronous Methods for Deep Reinforcement Learning
21768.5 Rainbow Rainbow: Combining Improvements in Deep Reinforcement Learning
17417.2 DDQN (tuned) Deep Reinforcement Learning with Double Q-learning
15002.4 Distributional DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
14961.0 Human Massively Parallel Methods for Deep Reinforcement Learning
14591.3 DuDQN Dueling Network Architectures for Deep Reinforcement Learning
13235.9 A3C FF 1 day Asynchronous Methods for Deep Reinforcement Learning
12041.9 Prioritized DQN (rank) Prioritized Experience Replay
9107.9 DDQN Deep Reinforcement Learning with Double Q-learning
8672.4 DQN Massively Parallel Methods for Deep Reinforcement Learning
3822.07 Gorila DQN Massively Parallel Methods for Deep Reinforcement Learning
254.6 Random Massively Parallel Methods for Deep Reinforcement Learning

No-op Starts

Result Algorithm Source
42776 IQN Implicit Quantile Networks for Distributional Reinforcement Learning
34821 QR-DQN-1 Distributional Reinforcement Learning with Quantile Regression
32463.47 IMPALA (deep) IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
30276.5 PDD DQN Dueling Network Architectures for Deep Reinforcement Learning
24919 QR-DQN-0 Distributional Reinforcement Learning with Quantile Regression
20793 NoisyNet DQN Noisy Networks for Exploration
18501 NoisyNet DuDQN Noisy Networks for Exploration
16926.5 Human Dueling Network Architectures for Deep Reinforcement Learning
16850.2 Rainbow Rainbow: Combining Improvements in Deep Reinforcement Learning
16298 DuDQN Noisy Networks for Exploration
14074 C51 A Distributional Perspective on Reinforcement Learning
13772.8 DDQN A Distributional Perspective on Reinforcement Learning
13581.4 ACKTR Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
13213.4 Distributional DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
12164.0 DuDQN Dueling Network Architectures for Deep Reinforcement Learning
11237 NoisyNet A3C Noisy Networks for Exploration
11033.4 Reactor The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
10564 DQN Noisy Networks for Exploration
9214 A3C Noisy Networks for Exploration
8811.8 Reactor ND The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
8627.5 DQN A Distributional Perspective on Reinforcement Learning
8566.5 Reactor The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
8219.92 IMPALA (shallow) IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
8148.1 A2C Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
7654.0 DDQN Deep Reinforcement Learning with Double Q-learning
6846 DQN Human-level control through deep reinforcement learning
5774.7 Human Human-level control through deep reinforcement learning
3302.91 Gorila DQN Massively Parallel Methods for Deep Reinforcement Learning
1743 Contingency Human-level control through deep reinforcement learning
929.4 Linear Human-level control through deep reinforcement learning
698.36 IMPALA (deep, multitask) IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
670.0 TRPO Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
363.9 Random Human-level control through deep reinforcement learning

Normal Starts

Result Algorithm Source
7456 Human Playing Atari with Deep Reinforcement Learning
6923 DQN Ours Deep Recurrent Q-Learning for Partially Observable MDPs
5702 UCC-I Trust Region Policy Optimization
5184 DQN2013 Best Playing Atari with Deep Reinforcement Learning
4092 DQN2013 Playing Atari with Deep Reinforcement Learning
3863.3 ACER Proximal Policy Optimization Algorithm
3760.976 ACKTR RL Baselines Zoo b76641e
3616 HNeat Best Playing Atari with Deep Reinforcement Learning
3269 DRQN Deep Recurrent Q-Learning for Partially Observable MDPs
3031.7 A2C Proximal Policy Optimization Algorithm
2809.115 A2C RL Baselines Zoo b76641e
2440.692 ACER RL Baselines Zoo b76641e
2171.19 ACKTR OpenAI Baselines cbd21ef
1959.22 ACER OpenAI Baselines cbd21ef
1743 Contingency Playing Atari with Deep Reinforcement Learning
1691.072 PPO RL Baselines Zoo b76641e
1685.6 DQN Ours Deep Recurrent Q-Learning for Partially Observable MDPs
1590.0 PPO Proximal Policy Optimization Algorithm
1582.34 DQN OpenAI Baselines cbd21ef
1425.2 TRPO - single path Trust Region Policy Optimization
1332 HNeat Pixel Playing Atari with Deep Reinforcement Learning
1302.61 A2C OpenAI Baselines cbd21ef
1299.25 PPO OpenAI Baselines cbd21ef
996 Sarsa Playing Atari with Deep Reinforcement Learning
888.741 DQN RL Baselines Zoo b76641e
859.5 TRPO - vine Trust Region Policy Optimization
683.11 TRPO (MPI) OpenAI Baselines cbd21ef
618 DRQN Deep Recurrent Q-Learning for Partially Observable MDPs
594.45 PPO (MPI) OpenAI Baselines cbd21ef
354 Random Playing Atari with Deep Reinforcement Learning