RIZE: Regularized Imitation Learning via Distributional Reinforcement Learning

27 February 2025

Abstract

We introduce a novel Inverse Reinforcement Learning (IRL) approach that overcomes limitations of fixed reward assignments and constrained flexibility in implicit reward regularization. By extending the Maximum Entropy IRL framework with a squared temporal-difference (TD) regularizer and adaptive targets, dynamically adjusted during training, our method indirectly optimizes a reward function while incorporating reinforcement learning principles. Furthermore, we integrate distributional RL to capture richer return information. Our approach achieves state-of-the-art performance on challenging MuJoCo tasks, demonstrating expert-level results on the Humanoid task with only 3 demonstrations. Extensive experiments and ablation studies validate the effectiveness of our method, providing insights into adaptive targets and reward dynamics in imitation learning.

View on arXiv

@article{karimi2025_2502.20089,
  title={ RIZE: Regularized Imitation Learning via Distributional Reinforcement Learning },
  author={ Adib Karimi and Mohammad Mehdi Ebadzadeh },
  journal={arXiv preprint arXiv:2502.20089},
  year={ 2025 }
}

Comments on this paper