Achieving Sample-Efficient Learning of Long-Horizon Sparse-Reward Robotic Tasks with Base Controllers

IEEE Transactions on Neural Networks and Learning Systems (IEEE TNNLS), 2020

24 November 2020

Minjian Xin

Abstract

Application of Deep Reinforcement Learning (DRL) algorithms in robotic tasks faces many challenges. On the one hand, reward-shaping for complex tasks that involve multiple sequences is difficult and may result in sub-optimal performances. On the other hand, a sparse-reward setting renders exploration inefficient, and exploration using physical robots is of high-cost and unsafe. In this paper we propose a method of learning long-horizon sparse-reward tasks utilizing one or more existing controllers. Built upon Deep Deterministic Policy Gradients (DDPG), our algorithm incorporates the controllers into stages of exploration, policy update, and most importantly, learning a heuristic value function that naturally interpolates along task trajectories. Through experiments ranging from stacking blocks to cups, we present a straightforward way of synthesizing these controllers, and show that the learned state-based or image-based policies steadily outperform them. Compared to previous works of learning from demonstrations, our method improves sample efficiency by orders of magnitude. Overall, our method bears the potential of leveraging existing industrial robot manipulation systems to build more flexible and intelligent controllers.

View on arXiv

Comments on this paper