252

Hierarchical Deep Q-Network with Forgetting from Imperfect Demonstrations in Minecraft

Abstract

We present hierarchical Deep Q-Network with Forgetting (HDQF) that took first place in MineRL competition. HDQF works on imperfect demonstrations utilize hierarchical structure of expert trajectories extracting effective sequence of meta-actions and subgoals. We introduce structured task dependent replay buffer and forgetting technique that allow the HDQF agent to gradually erase poor-quality expert data from the buffer. In this paper we present the details of the HDQF algorithm and give the experimental results in Minecraft domain.

View on arXiv
Comments on this paper