Atari Solaris Environment

Overview

The galaxy of Solaris is made up of 16 quadrants, each containing 48 sectors. The player uses a tactical map to choose a sector to warp to, during which they must attempt to keep their ship “in focus” to lower their fuel consumption rate. Fuel must be carefully managed, as an empty tank results in loss of one of the player lives. Space battle ensues whenever the player navigates into a hostile battlegroup via the tactical map. Space enemies include pirate ships, mechanoid ships, and aggressive “cobra” ships. Each battlegroup has at least one enemy flagship, which shoots out fuel-sapping drones.

The player may also descend to one of 3 types of planets.

  1. Friendly federation planets, which provide for refueling, but may also harbor a planet defense mission if they are under attack. If players allow a friendly planet in a quadrant to be destroyed, that quadrant becomes a “red zone” where joystick controls are reversed and booming sounds are overheard.
  2. Enemy Zylon planets, in which the player must rescue all cadets, gaining an extra ship when all cadets are rescued.
  3. Enemy corridor planets, in which the player must traverse through a fast-paced corridor.

There are 4 kinds of ground enemies found on planets: stationary guardians, gliders, targeters and raiders. The ultimate goal of Solaris is to reach the planet Solaris and rescue its colonists, at which point the game ends in victory.

Description from Wikipedia

Performances of RL Agents

We list various reinforcement learning algorithms that were tested in this environment. These results are from RL Database. If this page was helpful, please consider giving a star!

Star

Human Starts

Result Algorithm Source
11032.6 Human Deep Reinforcement Learning with Double Q-learning
11032.6 Human Dueling Network Architectures for Deep Reinforcement Learning
2860.7 Rainbow Rainbow: Combining Improvements in Deep Reinforcement Learning
2530.2 Distributional DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
2272.8 Prioritized DDQN (rank, tuned) Prioritized Experience Replay
2238.2 Prioritized DDQN (prop, tuned) Prioritized Experience Replay
2166.8 DDQN Deep Reinforcement Learning with Double Q-learning
2047.2 Random Deep Reinforcement Learning with Double Q-learning
1956.0 A3C FF Asynchronous Methods for Deep Reinforcement Learning
1936.4 A3C LSTM Asynchronous Methods for Deep Reinforcement Learning
1884.8 A3C FF 1 day Asynchronous Methods for Deep Reinforcement Learning
1768.4 DuDQN Dueling Network Architectures for Deep Reinforcement Learning
1295.4 DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
810.0 DDQN (tuned) Deep Reinforcement Learning with Double Q-learning
280.6 PDD DQN Dueling Network Architectures for Deep Reinforcement Learning
134.6 Prioritized DQN (rank) Prioritized Experience Replay

No-op Starts

Result Algorithm Source
12380 A3C Noisy Networks for Exploration
12326.7 Human Dueling Network Architectures for Deep Reinforcement Learning
10427 NoisyNet A3C Noisy Networks for Exploration
8342 C51 A Distributional Perspective on Reinforcement Learning
8007 IQN Implicit Quantile Networks for Distributional Reinforcement Learning
6740 QR-DQN-1 Distributional Reinforcement Learning with Quantile Regression
6522 NoisyNet DuDQN Noisy Networks for Exploration
6088 NoisyNet DQN Noisy Networks for Exploration
5643.1 Distributional DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
4055 DQN Noisy Networks for Exploration
3560.3 Rainbow Rainbow: Combining Improvements in Deep Reinforcement Learning
3482.8 DQN A Distributional Perspective on Reinforcement Learning
3482.8 DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
3423 DuDQN Noisy Networks for Exploration
2760 Reactor The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
2522 QR-DQN-0 Distributional Reinforcement Learning with Quantile Regression
2368.6 ACKTR Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
2368.4 IMPALA (shallow) IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
2365.0 IMPALA (deep) IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
2250.8 DDQN A Distributional Perspective on Reinforcement Learning
2250.8 DuDQN Dueling Network Architectures for Deep Reinforcement Learning
2236.0 Reactor ND The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
2099.6 Reactor The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
1236.3 Random Dueling Network Architectures for Deep Reinforcement Learning
1160.4 IMPALA (deep, multitask) IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
133.4 PDD DQN Dueling Network Architectures for Deep Reinforcement Learning

Normal Starts

Result Algorithm Source
3387 PPO Exploration by Random Network Distillation
3282 RND Exploration by Random Network Distillation
3246 Dynamics Exploration by Random Network Distillation