RL Weekly 20: Minecraft Competition, Off-policy Policy Evaluation via Classification, and Soft-attention Agent for Interpretability

Published Jun 11, 2019 by Seungjae Ryan Lee

MineRL: Learn Minecraft from Human Priors

William H. Guss^*1, Cayden Codel, Katja Hofmann², Brandon Houghton¹, Noboru Kuno², Stephanie Milani³, Sharada Mohanty⁴, Diego Perez Liebana⁵, Ruslan Salakhutdinov¹, Nicholay Topin¹, Manuela Veloso¹, Phillip Wang¹

¹Carnegie Mellon University ²Microsoft Research ³University of Maryland ⁴AICrowd ⁵Queen Mary University of London

What it is

MineRL is a competition for the upcoming NeurIPS 2019 conference. The competition uses the Minecraft environment, and the goal of the participants is to train the agent to obtain diamonds. This is a very difficult task, so the organizers also provide the MineRL dataset, which is a large-scale dataset of human demonstrations.

The competition started a few days ago and will end on October 25th. According to the organizers, Preferred Networks will be releasing a set of baselines for the competition soon.

Why it matters

Reinforcement learning competitions are amazing opportunities for new RL researchers to gain first-hand experience. MineRL offers a unique opportunity by providing human demonstration data. It is difficult for individual researchers to collect large amount of demonstrations to test their ideas. The competition alleviates this problem and allows researchers to implement their own algorithms without worrying about collecting data.

Read more

External Resources

Off-Policy Evaluation via Off-Policy Classification

Alex Irpan¹, Kanishka Rao¹, Konstantinos Bousmalis², Chris Harris¹, Julian Ibarz¹, Sergey Levine¹³

¹Google Brain ²DeepMind ³UC Berkeley

What it is & Why it matters

Traditionally, a trained agent is evaluated by interacting with the target environment. Although this is feasible when the target environment is a simulated environment, it may be problematic in real-life applications like robotics. In these cases, off-policy evaluation (OPE) methods should be used. Different from existing OPE methods that require a good model of the environment or use importance sampling, this paper frames OPE as a “positive-unlabeled” classification problem. A state-action pair is labeled “effective” if an optimal policy can achieve success in that situation, and “catastrophic” otherwise. The intuition lies in that a well-learnt Q-function should return high value for effective state-action pair and low value for catastrophic state-action pair.

Read more

Towards Interpretable Reinforcement Learning Using Attention Augmented Agents

Alex Mott^*1, Daniel Zoran^*1, Mike Chrzanowski¹, Daan Wierstra¹, Danilo J. Rezende¹

¹DeepMind

What it is & Why it matters

The authors propose a LSTM architecture with a soft, top-down, spatial attention mechanism. The paper is not the first to propose using attention in RL agents, but the numerous experiments show how attention can be used to qualitatively evaluate and interpret agents’ abilities. The project website below shows how attention can be used to understand how the agent reacts to novel states, how the agent plans, and what the agent’s strategy is.

Read more

Some more exciting news in RL:

Researchers at Google Research open-sourced Google Research Football, an RL environment for football.
Researchers at Electronic Arts and Institute of Computational Modelling trained Non-Player Characters (NPCs) in video games through imitation learning with a human in the loop.
Researchers at OpenAI performed an empirical study to show the relationship between hyperparameters and generalization.
Researchers at Nanjing University proposed Clustered RL that divides collected states into clusters and defines a clustering-based bonus reward to incentivise exploration.
Researchers at MIT, Harvard, Diffeo, and CBMM developed DeepRole, a multi-agent RL agent that learns who to cooperate with and outperforms humans in the game The Resistance: Avalon.
Researchers at DeepMind and University of Toronto proposed OPRE (OPtions as REsponses), a multi-agent hierarchical agent.
Researchers at University of Oxford propose Independent Centrally-assisted Q-Learning (ICQL) that allow using intrinsic rewards for multi-agent RL.
Researchers at Microsoft Research Montreal and Imperial College London hypothesized that the poor performance of low discount factors are not due to small action-gaps but due to “the size-difference of the action gaps across the state-space.”

endtoend.ai

RL Weekly 20: Minecraft Competition, Off-policy Policy Evaluation via Classification, and Soft-attention Agent for Interpretability

Subscribe to RL Weekly

MineRL: Learn Minecraft from Human Priors

Off-Policy Evaluation via Off-Policy Classification

Towards Interpretable Reinforcement Learning Using Attention Augmented Agents

Related Posts

Explore →

endtoend.ai

RL Weekly 20: Minecraft Competition, Off-policy Policy Evaluation via Classification, and Soft-attention Agent for Interpretability

Subscribe to RL Weekly

MineRL: Learn Minecraft from Human Priors

Off-Policy Evaluation via Off-Policy Classification

Towards Interpretable Reinforcement Learning Using Attention Augmented Agents

Related Posts

RL Weekly 19: Curious Object-Based Search Agent, Multiplicative Compositional Policies, and AutoRL

RL Weekly 21: The interplay between Experience Replay and Model-based RL

Explore →