29
0

Reward Generation via Large Vision-Language Model in Offline Reinforcement Learning

Abstract

In offline reinforcement learning (RL), learning from fixed datasets presents a promising solution for domains where real-time interaction with the environment is expensive or risky. However, designing dense reward signals for offline dataset requires significant human effort and domain expertise. Reinforcement learning with human feedback (RLHF) has emerged as an alternative, but it remains costly due to the human-in-the-loop process, prompting interest in automated reward generation models. To address this, we propose Reward Generation via Large Vision-Language Models (RG-VLM), which leverages the reasoning capabilities of LVLMs to generate rewards from offline data without human involvement. RG-VLM improves generalization in long-horizon tasks and can be seamlessly integrated with the sparse reward signals to enhance task performance, demonstrating its potential as an auxiliary reward signal.

View on arXiv
@article{lee2025_2504.08772,
  title={ Reward Generation via Large Vision-Language Model in Offline Reinforcement Learning },
  author={ Younghwan Lee and Tung M. Luu and Donghoon Lee and Chang D. Yoo },
  journal={arXiv preprint arXiv:2504.08772},
  year={ 2025 }
}
Comments on this paper