Atari Solaris Environment

Overview

The galaxy of Solaris is made up of 16 quadrants, each containing 48 sectors. The player uses a tactical map to choose a sector to warp to, during which they must attempt to keep their ship “in focus” to lower their fuel consumption rate. Fuel must be carefully managed, as an empty tank results in loss of one of the player lives. Space battle ensues whenever the player navigates into a hostile battlegroup via the tactical map. Space enemies include pirate ships, mechanoid ships, and aggressive “cobra” ships. Each battlegroup has at least one enemy flagship, which shoots out fuel-sapping drones.

The player may also descend to one of 3 types of planets.

Friendly federation planets, which provide for refueling, but may also harbor a planet defense mission if they are under attack. If players allow a friendly planet in a quadrant to be destroyed, that quadrant becomes a “red zone” where joystick controls are reversed and booming sounds are overheard.
Enemy Zylon planets, in which the player must rescue all cadets, gaining an extra ship when all cadets are rescued.
Enemy corridor planets, in which the player must traverse through a fast-paced corridor.

There are 4 kinds of ground enemies found on planets: stationary guardians, gliders, targeters and raiders. The ultimate goal of Solaris is to reach the planet Solaris and rescue its colonists, at which point the game ends in victory.

Description from Wikipedia

Performances of RL Agents

We list various reinforcement learning algorithms that were tested in this environment. These results are from RL Database. If this page was helpful, please consider giving a star!

Star

Human Starts

Result	Algorithm	Source
11032.6	Human	Deep Reinforcement Learning with Double Q-learning
11032.6	Human	Dueling Network Architectures for Deep Reinforcement Learning
2860.7	Rainbow	Rainbow: Combining Improvements in Deep Reinforcement Learning
2530.2	Distributional DQN	Rainbow: Combining Improvements in Deep Reinforcement Learning
2272.8	Prioritized DDQN (rank, tuned)	Prioritized Experience Replay
2238.2	Prioritized DDQN (prop, tuned)	Prioritized Experience Replay
2166.8	DDQN	Deep Reinforcement Learning with Double Q-learning
2047.2	Random	Deep Reinforcement Learning with Double Q-learning
1956.0	A3C FF	Asynchronous Methods for Deep Reinforcement Learning
1936.4	A3C LSTM	Asynchronous Methods for Deep Reinforcement Learning
1884.8	A3C FF 1 day	Asynchronous Methods for Deep Reinforcement Learning
1768.4	DuDQN	Dueling Network Architectures for Deep Reinforcement Learning
1295.4	DQN	Rainbow: Combining Improvements in Deep Reinforcement Learning
810.0	DDQN (tuned)	Deep Reinforcement Learning with Double Q-learning
280.6	PDD DQN	Dueling Network Architectures for Deep Reinforcement Learning
134.6	Prioritized DQN (rank)	Prioritized Experience Replay

No-op Starts

Result	Algorithm	Source
12380	A3C	Noisy Networks for Exploration
12326.7	Human	Dueling Network Architectures for Deep Reinforcement Learning
10427	NoisyNet A3C	Noisy Networks for Exploration
8342	C51	A Distributional Perspective on Reinforcement Learning
8007	IQN	Implicit Quantile Networks for Distributional Reinforcement Learning
6740	QR-DQN-1	Distributional Reinforcement Learning with Quantile Regression
6522	NoisyNet DuDQN	Noisy Networks for Exploration
6088	NoisyNet DQN	Noisy Networks for Exploration
5643.1	Distributional DQN	Rainbow: Combining Improvements in Deep Reinforcement Learning
4055	DQN	Noisy Networks for Exploration
3560.3	Rainbow	Rainbow: Combining Improvements in Deep Reinforcement Learning
3482.8	DQN	A Distributional Perspective on Reinforcement Learning
3482.8	DQN	Rainbow: Combining Improvements in Deep Reinforcement Learning
3423	DuDQN	Noisy Networks for Exploration
2760	Reactor	The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
2522	QR-DQN-0	Distributional Reinforcement Learning with Quantile Regression
2368.6	ACKTR	Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
2368.4	IMPALA (shallow)	IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
2365.0	IMPALA (deep)	IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
2250.8	DDQN	A Distributional Perspective on Reinforcement Learning
2250.8	DuDQN	Dueling Network Architectures for Deep Reinforcement Learning
2236.0	Reactor ND	The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
2099.6	Reactor	The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
1236.3	Random	Dueling Network Architectures for Deep Reinforcement Learning
1160.4	IMPALA (deep, multitask)	IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
133.4	PDD DQN	Dueling Network Architectures for Deep Reinforcement Learning

Normal Starts

Result	Algorithm	Source
3387	PPO	Exploration by Random Network Distillation
3282	RND	Exploration by Random Network Distillation
3246	Dynamics	Exploration by Random Network Distillation

endtoend.ai