RL Weekly 28: Free-Lunch Saliency and Hierarchical RL with Behavior Cloning

by Seungjae Ryan Lee

Free-Lunch Saliency via Attention in Atari Agents

Dmitry Nikulin1, Anastasia Ianina1, Vladimir Aliev1, Sergey Nikolenko123

1Samsung AI Center, Moscow, Russia 2Steklov Institute of Mathematics at St. Petersburg, Russia 3Neuromation OU, Tallinn, Estonia

What it says

To interpret Deep RL agents, saliency maps are commonly used to highlight pixel areas that the agent deemed important. There are two ways to generate such map: post-hoc saliency method and built-in saliency method. Post-hoc saliency method reserves interpreting the agent after the training is complete, whereas built-in saliency methods use specific models that improve interpretability (Section 2). This work focuses on the built-in methods.

As shown in Table 1 above, there has been multiple works on built-in methods, but these methods have worse performance compared to their non-interpretable versions. The authors propose a new method named Free-Lunch Saliency, claiming that the interpretability comes “free” without the performance drop. The FLS module is situated between the convolutional layers and the fully connected layers.

For the performance, the authors run experiments on 6 Atari environments and verify that Sparse FLS does not lead to a performance drop (Section 4.2, Table 2). The authors show that having smaller receptive fields and strides allows for crisper saliency maps (Dense FLS). However, it also increases more memory and necessitates sum-pooling. The authors report this leads to a significantly worse performance.

For the quality of interpretability, the authors also compare the saliency maps generated by various built-in methods using the Atari-HEAD human dataset. The authors note that although no method clearly dominates others, they all perform better than random (Section 4.3).

Read more

External resources

Hierarchical RL with Behavior Cloning

Robin Strudel*1, Alexander Pashevich*2, Igor Kalevatykh1, Ivan Laptev1, Josef Sivic1, Cordelia Schmid2

1Inria, École normale supérieure, CNRS, PSL Research University, 75005 Paris, France 2University Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France.

What it says

Imitation learning methods learn to solve a task through demonstrations. Although it is efficient for learning short trajectories with limited variability, the agent struggles when it observes states not seen in the demonstration. Reinforcement learning is able to learn without such demonstration dataset and generalizes better. However, it is not sample efficient since it relies on exploration to generalize.

To get the best of both worlds, the authors propose HRL-BC, a method of combining imitation learning and reinforcement learning through hierarchical RL (HRL). First, the agent learns primitive skill policies through behavior cloning (BC) (Section 3.A). Afterwards, the agent trains a master policy through Proximal Policy Optimization (PPO) that selects actions at a slower rate (Section 3.B).

The authors test their HRL-BC approach on both simulated and real robot arms (UR5). The authors report that the agent was able to learn skills quickly through behavior cloning with ResNet architecture and data augmentation (Section 4). Furthermore, HRL-BC shows superior success rate to methods that use only imitation learning or reinforcement learning (Section 5.B).

Read more

One-line introductions to more exciting news in RL this week:

Subscribe to RL Weekly

Get the highlights of reinforcement learning in both research and industry every week.