NOTE. This stub post is for my lab teammates. It will be populated after posts 3 and 4 is published.
The Unity team used Rainbow and PPO agents to test their environments. Although they did not perform any hyperparameter tuning, the team made it clear that neither vanilla Rainbow nor vanilla PPO can solve the 25-floor environment.
Fortunately, they also listed some possible methods that have high potential to improve the score. In this post, you will understand the central idea of each of these methods.
FeUdal Networks for Hierarchical Reinforcement Learning
Data-Efficient Hierarchical Reinforcement Learning
Curiosity-driven Exploration by Self-supervised Prediction
Exploration by Random Network Distillation
Unifying Count-Based Exploration and Intrinsic Motivation
Count-Based Exploration with Neural Density Models
Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Fast Reinforcement Learning via Slow Reinforcement Learning
Imagination-Augmented Agents for Deep Reinforcement Learning