Observational Overfitting in Reinforcement Learning

Song et al., 2019 | https://arxiv.org/abs/1912.02975

  • Observational overfitting: Agent overfits due to properties of the observation irrelevant to the latent dynamics of the MDP.
  • Effect: This could hinder generalization.
  • Evidence 1: Scoreboard and background objects is highlighted red in the saliency map.
  • Evidence 2: Covering the scoreboard with a black rectangle during training resulted in a 10% increased test performance.
  • Solution?: Overparametrizing can help as a form of “implicit regularization.”, improving generalization to test set.