Atari James Bond Environment

Overview

The player controls the titular character of James Bond across four levels. The player is given a multi-purpose vehicle that acts as an automobile, a plane, and a submarine. The vehicle can fire shots and flare bombs, and travels from left to right as the player progresses through each level. The player can shoot or avoid enemies and obstacles that appear throughout the game, including boats, frogmen, helicopters, missiles, and mini-submarines.

Description from Wikipedia

Performances of RL Agents

We list various reinforcement learning algorithms that were tested in this environment. These results are from RL Database. If this page was helpful, please consider giving a star!

Star

Human Starts

Result Algorithm Source
3961.0 Prioritized DDQN (rank, tuned) Prioritized Experience Replay
3511.5 Prioritized DDQN (prop, tuned) Prioritized Experience Replay
1074.5 Prioritized DQN (rank) Prioritized Experience Replay
835.5 DuDQN Dueling Network Architectures for Deep Reinforcement Learning
613.0 A3C LSTM Asynchronous Methods for Deep Reinforcement Learning
585.0 PDD DQN Dueling Network Architectures for Deep Reinforcement Learning
573.0 DDQN (tuned) Deep Reinforcement Learning with Double Q-learning
541.0 A3C FF Asynchronous Methods for Deep Reinforcement Learning
444.0 Gorila DQN Massively Parallel Methods for Deep Reinforcement Learning
416.0 DDQN Deep Reinforcement Learning with Double Q-learning
368.5 Human Massively Parallel Methods for Deep Reinforcement Learning
351.5 A3C FF 1 day Asynchronous Methods for Deep Reinforcement Learning
348.5 DQN Massively Parallel Methods for Deep Reinforcement Learning
33.5 Random Massively Parallel Methods for Deep Reinforcement Learning

No-op Starts

Result Algorithm Source
35108 IQN Implicit Quantile Networks for Distributional Reinforcement Learning
16056.2 Reactor The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
14524.0 Reactor ND The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
7869.2 Reactor The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
4703 QR-DQN-1 Distributional Reinforcement Learning with Quantile Regression
4682 NoisyNet DuDQN Noisy Networks for Exploration
1909 C51 A Distributional Perspective on Reinforcement Learning
1667 DuDQN Noisy Networks for Exploration
1358.0 DDQN A Distributional Perspective on Reinforcement Learning
1312.5 DuDQN Dueling Network Architectures for Deep Reinforcement Learning
1235 NoisyNet DQN Noisy Networks for Exploration
1028 QR-DQN-0 Distributional Reinforcement Learning with Quantile Regression
909 DQN Noisy Networks for Exploration
812.0 PDD DQN Dueling Network Architectures for Deep Reinforcement Learning
768.5 DQN A Distributional Perspective on Reinforcement Learning
605.0 Gorila DQN Massively Parallel Methods for Deep Reinforcement Learning
601.5 IMPALA (deep) IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
576.7 DQN Human-level control through deep reinforcement learning
509 A3C Noisy Networks for Exploration
490.0 ACKTR Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
440.0 IMPALA (shallow) IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
438.0 DDQN Deep Reinforcement Learning with Double Q-learning
406.7 Human Human-level control through deep reinforcement learning
354.1 Contingency Human-level control through deep reinforcement learning
302.8 Human Dueling Network Architectures for Deep Reinforcement Learning
284.0 IMPALA (deep, multitask) IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
202.8 Linear Human-level control through deep reinforcement learning
188 NoisyNet A3C Noisy Networks for Exploration
29.0 Random Human-level control through deep reinforcement learning

Normal Starts

Result Algorithm Source
560.7 PPO Proximal Policy Optimization Algorithm
261.8 ACER Proximal Policy Optimization Algorithm
52.3 A2C Proximal Policy Optimization Algorithm