Atari Private Eye Environment

Overview

n Private Eye, players assume the role of Pierre Touché, a private investigator who has been assigned the task of capturing the criminal mastermind, Henri Le Fiend. Le Fiend is implicated in a number of crimes across the city, and the player must find the clues and the stolen property in order to successfully arrest Le Fiend.

The game consists of four separate cases. Using a specially-built Model A that can jump over obstacles, players must search the city for a specific clue to the crime and for the object stolen in the crime. Each item must then be returned to its point of origin; the clue is taken to a business to verify it came from there, and the stolen object is returned to its rightful owner. These items may be discovered in any order, but players may carry only one item at a time. When both items have been located and returned, then the player must locate and capture Le Fiend, and finally take him to jail, successfully closing the case.

However, the city is full of street thugs who will attack the player. If the player is hit while carrying an item (either the clue or the stolen property), the item is lost and must be re-located. Further, each case has a statute of limitations, which serves as the game’s time limit. To win the game, the player must locate and verify the clue, locate and return the stolen property, and lastly locate Le Fiend and take him to jail within the time allotted.

The player starts with 1000 “merit points”. Points are lost whenever the player hits an obstacle or is attacked by a thug, and are awarded whenever an item is located, subsequently returned, and when a thug (or Le Fiend himself) is nabbed. Each case represents a separate game variation; when the case is solved or time runs out, the game ends. A fifth variation requires the player to solve all four crimes at the same time.

Description from Wikipedia

Performances of RL Agents

We list various reinforcement learning algorithms that were tested in this environment. These results are from RL Database. If this page was helpful, please consider giving a star!

Star

Human Starts

Result Algorithm Source
64169.1 Human Massively Parallel Methods for Deep Reinforcement Learning
5717.5 Distributional DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
2598.55 Gorila DQN Massively Parallel Methods for Deep Reinforcement Learning
2202.3 Prioritized DQN (rank) Prioritized Experience Replay
1704.4 Rainbow Rainbow: Combining Improvements in Deep Reinforcement Learning
1277.6 PDD DQN Dueling Network Architectures for Deep Reinforcement Learning
670.7 Prioritized DDQN (rank, tuned) Prioritized Experience Replay
662.8 Random Massively Parallel Methods for Deep Reinforcement Learning
421.1 A3C LSTM Asynchronous Methods for Deep Reinforcement Learning
346.3 DDQN Deep Reinforcement Learning with Double Q-learning
298.2 DQN Massively Parallel Methods for Deep Reinforcement Learning
292.6 DuDQN Dueling Network Architectures for Deep Reinforcement Learning
206.9 A3C FF Asynchronous Methods for Deep Reinforcement Learning
194.4 A3C FF 1 day Asynchronous Methods for Deep Reinforcement Learning
179.0 Prioritized DDQN (prop, tuned) Prioritized Experience Replay
-575.5 DDQN (tuned) Deep Reinforcement Learning with Double Q-learning

No-op Starts

Result Algorithm Source
69571.3 Human Dueling Network Architectures for Deep Reinforcement Learning
69571.3 Human Human-level control through deep reinforcement learning
15198.0 Reactor ND The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
15188.8 Reactor The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
15177.1 Reactor The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning
15172.9 Distributional DQN Rainbow: Combining Improvements in Deep Reinforcement Learning
15095 C51 A Distributional Perspective on Reinforcement Learning
4234.0 Rainbow Rainbow: Combining Improvements in Deep Reinforcement Learning
3781 A3C Noisy Networks for Exploration
3712 NoisyNet DQN Noisy Networks for Exploration
2361 DQN Noisy Networks for Exploration
1788 DQN Human-level control through deep reinforcement learning
748.6 Gorila DQN Massively Parallel Methods for Deep Reinforcement Learning
684.3 Linear Human-level control through deep reinforcement learning
670.1 DDQN Deep Reinforcement Learning with Double Q-learning
350 QR-DQN-1 Distributional Reinforcement Learning with Quantile Regression
279 NoisyNet DuDQN Noisy Networks for Exploration
227 DuDQN Noisy Networks for Exploration
206.0 PDD DQN Dueling Network Architectures for Deep Reinforcement Learning
200 IQN Implicit Quantile Networks for Distributional Reinforcement Learning
146.7 DQN A Distributional Perspective on Reinforcement Learning
146 QR-DQN-0 Distributional Reinforcement Learning with Quantile Regression
129.7 DDQN A Distributional Perspective on Reinforcement Learning
103.0 DuDQN Dueling Network Architectures for Deep Reinforcement Learning
100 NoisyNet A3C Noisy Networks for Exploration
98.5 IMPALA (deep) IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
92.42 IMPALA (shallow) IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
86 Contingency Human-level control through deep reinforcement learning
24.9 Random Human-level control through deep reinforcement learning
0.0 IMPALA (deep, multitask) IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

Normal Starts

Result Algorithm Source
8666 RND Exploration by Random Network Distillation
182.0 ACER Proximal Policy Optimization Algorithm
105 PPO Exploration by Random Network Distillation
91.3 A2C Proximal Policy Optimization Algorithm
69.5 PPO Proximal Policy Optimization Algorithm
33 Dynamics Exploration by Random Network Distillation