## Why I Read This

- It is from Google DeepMind.
- It was listed in OpenAI Spinning Up: Key Papers in Deep RL.

## Short Summary

- Popular deep reinforcement learning algorithms (DQN, A3C) suffer from learning speed and data efficiency.
- Small learning rate since high learning rates can cause catastrophic interference.
- Sparse reward signals cause neural network to underperform at predicting large rewards due to class imbalance.
- Experience replay and target networks slows down reward signal propagation.

- The Neural Episodic Control (NEC) agent consists of three components: a convolutional neural network (CNN), a set of
**Differentiable Neural Dictionary (DND)**, and a final network. - The agent can perform
**query**and**write**to DND.**query**DND uses a semi-tabular representation: given a key, it does not return a value from a table simply through lookup. Instead, it finds $p$ nearest neighbors from the dictionary and returns a weight sum of these $p$ values.- The distance of key $h$ and each key $h_i$ in DND is calculated by a kernel function $k(h, h_i) = 1 / ||h - h_i ||^2_2 + \delta$.
- The weight $w_i$ for the weighted sum is the normalized kernel: $w_i = k(h, h_i) / \sum_j k(h, h_j)$.

**write**After each query, the queried key and its value (computed with $n$-step Q-Learning) is added to DND. If DND has maximum size, the key-value pair that was shown up least recently is overwritten.

- The CNN-NEC-NN is fully differentiable and is trained end-to-end by minimising the L2 loss between the predicted Q value and the $Q^{(N)}$ estimate on randomly sampled mini-batches from the replay buffer.
- NEC does not require clipped rewards, so it outperforms baseline algorithms (DQN, $Q^*(\lambda)$, Prioritized DQN, A3C, NEC, and MFEC) on
*Alien*and*Ms. Pacman*. - NEC is vastly more data efficient than all baseline algorithms in other environments that have same-scale rewards (
*Bowling*,*Pong*).

## Thoughts

- I should have read Model-Free Episodic Control (MFEC) by Blundell et al. (2016) first. The paper makes a lot of comparison between MFEC and NEC.
- Will prioritizing the experience replay improve NEC?
- With more data, will traditional methods (DQN, A3C) outperform NEC again?

## Accompanying Resources

If you want to learn more about the Memory-based Reinforcement Learning, check these resources.

- Reinforcement Learning: An Introduction, Section 9.9, 9.10 (Book)
- Model-Free Episodic Control (arXiv Paper)
- Neural Map: Structured Memory for Deep Reinforcement Learning (arXiv Paper)
- Unsupervised Predictive Memory in a Goal-Directed Agent (arXiv Paper)
- Relational recurrent neural networks (arXiv Paper)