Atari Wizard of Wor Environment

Overview

The players’ characters, called Worriors, must kill all the monsters by shooting them. Player one has yellow Worriors, on the right, and player two has blue Worriors, on the left. In a two-player game, the players are also able to shoot each other’s Worriors, earning bonus points and causing the other player to lose a life. Team-oriented players can successfully advance through the game by standing back-to-back (such as in a corner) and firing at anything that comes at them.

Each dungeon consists of a single-screen rectangular grid with walls and corridors in various formations. The Worriors and the monsters can travel freely through the corridors. Each dungeon has doors at the left and right edges, which connect with each other, making the dungeon wrap around. Whenever a door is traversed by a player or monster, both of them deactivate for a short period, making them impassable. A player who exits the door can pop back through the door immediately when the Worluk or Wizard is in the dungeon. A small radar display indicates the positions of all active monsters.

As long as a player has at least one life in reserve, a backup Worrior is displayed in a small sealed cubbyhole at the corresponding bottom corner of the dungeon. When the current Worrior is killed, the cubbyhole opens and the player has 10 seconds to move the backup into play before automatically being forced in.

The various monsters include the following:

Burwor: A blue wolf-type creature.
Garwor: A yellow Tyrannosaurus rex-type creature.
Thorwor: A red scorpion-like creature.
Worluk: An Insectoid-type creature.
Wizard of Wor: A blue wizard. Both Garwors and Thorwors have the ability to turn invisible at times, but will always appear on the radar. All enemies except the Worluk can shoot at the Worriors.

Each dungeon starts filled with six Burwors. In the first dungeon, killing the last Burwor will make a Garwor appear; in the second, the last two Burwors are replaced by Garwors when killed; and so on. From the sixth dungeon on, a Garwor will replace every Burwor when killed. On every screen, killing a Garwor causes a Thorwor to appear. There will never be more than six enemies on the screen at once. From the second dungeon on, after the last Thorwor is killed, a Worluk will appear and try to escape through one of the side doors. The level ends when the Worluk either escapes or is killed; in the latter case, all point values for the next dungeon are doubled.

The Wizard of Wor will appear in or after the second dungeon once the Worluk has either escaped or been killed. After a few seconds the Wizard will disappear and teleport across the dungeon, gradually approaching a Worrior. The Wizard remains in the dungeon until he shoots a Worrior or is killed. He uses a speech synthesizer to taunt the players throughout the game.

Players are referred to as “Worriors” during the first seven levels, then as “Worlords” beyond that point. The “Worlord Dungeons” are more difficult than the earlier levels because they have fewer interior walls.

There are two special dungeons with increased difficulty. Level 4 is “The Arena,” with a large open area in its center, and Level 13 is “The Pit,” with no interior walls at all. A bonus Worrior is awarded before each of these levels. Every sixth dungeon after Level 13 is another Pit. A player who survives any Pit level without losing a life earns the title of “Worlord Supreme.”

Description from Wikipedia

Performances of RL Agents

We list various reinforcement learning algorithms that were tested in this environment. These results are from RL Database. If this page was helpful, please consider giving a star!

Star

Human Starts

Result	Algorithm	Source
18082.0	A3C LSTM	Asynchronous Methods for Deep Reinforcement Learning
17244.0	A3C FF	Asynchronous Methods for Deep Reinforcement Learning
14631.5	Rainbow	Rainbow: Combining Improvements in Deep Reinforcement Learning
11824.5	Distributional DQN	Rainbow: Combining Improvements in Deep Reinforcement Learning
10471.0	PDD DQN	Dueling Network Architectures for Deep Reinforcement Learning
10431.0	Gorila DQN	Massively Parallel Methods for Deep Reinforcement Learning
7451.0	Prioritized DDQN (prop, tuned)	Prioritized Experience Replay
7054.0	DuDQN	Dueling Network Architectures for Deep Reinforcement Learning
6201.0	DDQN (tuned)	Deep Reinforcement Learning with Double Q-learning
5727.0	Prioritized DDQN (rank, tuned)	Prioritized Experience Replay
5278.0	A3C FF 1 day	Asynchronous Methods for Deep Reinforcement Learning
4556.0	Human	Massively Parallel Methods for Deep Reinforcement Learning
2755.0	Prioritized DQN (rank)	Prioritized Experience Replay
804.0	Random	Massively Parallel Methods for Deep Reinforcement Learning
246.0	DQN	Massively Parallel Methods for Deep Reinforcement Learning
155.0	DDQN	Deep Reinforcement Learning with Double Q-learning

No-op Starts

Result	Algorithm	Source
31190	IQN	Implicit Quantile Networks for Distributional Reinforcement Learning
26844	QR-DQN-0	Distributional Reinforcement Learning with Quantile Regression
25061	QR-DQN-1	Distributional Reinforcement Learning with Quantile Regression
19530.5	Reactor	The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
18484.0	Reactor ND	The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
17862.5	Rainbow	Rainbow: Combining Improvements in Deep Reinforcement Learning
15994.5	Distributional DQN	Rainbow: Combining Improvements in Deep Reinforcement Learning
13731.33	Gorila DQN	Massively Parallel Methods for Deep Reinforcement Learning
13170.5	Reactor	The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
12723	NoisyNet A3C	Noisy Networks for Exploration
12352.0	PDD DQN	Dueling Network Architectures for Deep Reinforcement Learning
9300	C51	A Distributional Perspective on Reinforcement Learning
9198	NoisyNet DQN	Noisy Networks for Exploration
9157.5	IMPALA (deep)	IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
9149	NoisyNet DuDQN	Noisy Networks for Exploration
8953	A3C	Noisy Networks for Exploration
7855.0	DDQN	A Distributional Perspective on Reinforcement Learning
7855.0	DuDQN	Dueling Network Architectures for Deep Reinforcement Learning
6534	DuDQN	Noisy Networks for Exploration
5204.0	DDQN	Deep Reinforcement Learning with Double Q-learning
4756.5	Human	Dueling Network Architectures for Deep Reinforcement Learning
4756.5	Human	Human-level control through deep reinforcement learning
4203.0	IMPALA (shallow)	IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
3601	DQN	Noisy Networks for Exploration
3393	DQN	Human-level control through deep reinforcement learning
2704.0	DQN	A Distributional Perspective on Reinforcement Learning
2106.0	IMPALA (deep, multitask)	IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
1981	Linear	Human-level control through deep reinforcement learning
702.0	ACKTR	Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
563.5	Random	Human-level control through deep reinforcement learning
36.9	Contingency	Human-level control through deep reinforcement learning

Normal Starts

Result	Algorithm	Source
4185.3	PPO	Proximal Policy Optimization Algorithm
2308.3	ACER	Proximal Policy Optimization Algorithm
859.0	A2C	Proximal Policy Optimization Algorithm

endtoend.ai