Paper Annotations

Why I Read This

Short Summary

  • Popular deep reinforcement learning algorithms (DQN, A3C) suffer from learning speed and data efficiency.
    1. Small learning rate since high learning rates can cause catastrophic interference.
    2. Sparse reward signals cause neural network to underperform at predicting large rewards due to class imbalance.
    3. Experience replay and target networks slows down reward signal propagation.
  • The Neural Episodic Control (NEC) agent consists of three components: a convolutional neural network (CNN), a set of Differentiable Neural Dictionary (DND), and a final network.
  • The agent can perform query and write to DND.
    • query DND uses a semi-tabular representation: given a key, it does not return a value from a table simply through lookup. Instead, it finds $p$ nearest neighbors from the dictionary and returns a weight sum of these $p$ values.
      • The distance of key $h$ and each key $h_i$ in DND is calculated by a kernel function $k(h, h_i) = 1 / ||h - h_i ||^2_2 + \delta$.
      • The weight $w_i$ for the weighted sum is the normalized kernel: $w_i = k(h, h_i) / \sum_j k(h, h_j)$.
    • write After each query, the queried key and its value (computed with $n$-step Q-Learning) is added to DND. If DND has maximum size, the key-value pair that was shown up least recently is overwritten.
  • The CNN-NEC-NN is fully differentiable and is trained end-to-end by minimising the L2 loss between the predicted Q value and the $Q^{(N)}$ estimate on randomly sampled mini-batches from the replay buffer.
  • NEC does not require clipped rewards, so it outperforms baseline algorithms (DQN, $Q^*(\lambda)$, Prioritized DQN, A3C, NEC, and MFEC) on Alien and Ms. Pacman.
  • NEC is vastly more data efficient than all baseline algorithms in other environments that have same-scale rewards (Bowling, Pong).

Thoughts

  • I should have read Model-Free Episodic Control (MFEC) by Blundell et al. (2016) first. The paper makes a lot of comparison between MFEC and NEC.
  • Will prioritizing the experience replay improve NEC?
  • With more data, will traditional methods (DQN, A3C) outperform NEC again?

Accompanying Resources

If you want to learn more about the Memory-based Reinforcement Learning, check these resources.