Observational Overfitting in Reinforcement Learning

Song et al., 2019 | https://arxiv.org/abs/1912.02975

  • Agents can overfit to parts of observation irrelevant to MDP dynamics such as the scoreboard or the background, as they are correlated with progress.
  • Observational overfitting hurts agent's generalization.
  • Overparametrization can mitigate observational overfitting and improve generalization.