We investigate the use of animal videos (observations) to improve Reinforcement Learning (RL) efficiency and performance in navigation tasks with sparse rewards. Motivated by theoretical considerations, we make use of weighted policy optimization for off-policy RL and describe the main challenges when learning from animal videos. We propose solutions and test our ideas on a series of 2D navigation tasks. We show how our methods can leverage animal videos to improve performance over RL algorithms that do not leverage such observations.
View on arXiv