PAGAR: Taming Reward Misalignment in Inverse Reinforcement
Learning-Based Imitation Learning with Protagonist Antagonist Guided
Adversarial Reward
Many imitation learning (IL) algorithms employ inverse reinforcement learning (IRL) to infer the underlying reward function that an expert is implicitly optimizing for, based on their demonstrated behaviors. However, a misalignment between the inferred reward and the true task objective can result in task failures. In this paper, we introduce Protagonist Antagonist Guided Adversarial Reward (PAGAR), a semi-supervised reward design paradigm to tackle this reward misalignment problem in IRL-based IL. We identify the conditions on the candidate reward functions under which PAGAR can guarantee to induce a policy that succeeds in the underlying task. Furthermore, we present a practical on-and-off policy approach to implement PAGAR in IRL-based IL. Experimental results show that our algorithm outperforms competitive baselines on complex IL tasks and zero-shot IL tasks in transfer environments with limited demonstrations.
View on arXiv