Generalist World Model Pre-Training for Efficient Reinforcement Learning

26 February 2025

Abstract

Sample-efficient robot learning is a longstanding goal in robotics. Inspired by the success of scaling in vision and language, the robotics community is now investigating large-scale offline datasets for robot learning. However, existing methods often require expert and/or reward-labeled task-specific data, which can be costly and limit their application in practice. In this paper, we consider a more realistic setting where the offline data consists of reward-free and non-expert multi-embodiment offline data. We show that generalist world model pre-training (WPT), together with retrieval-based experience rehearsal and execution guidance, enables efficient reinforcement learning (RL) and fast task adaptation with such non-curated data. In experiments over 72 visuomotor tasks, spanning 6 different embodiments, covering hard exploration, complex dynamics, and various visual properties, WPT achieves 35.65% and 35% higher aggregated score compared to widely used learning-from-scratch baselines, respectively.

View on arXiv

@article{zhao2025_2502.19544,
  title={ Generalist World Model Pre-Training for Efficient Reinforcement Learning },
  author={ Yi Zhao and Aidan Scannell and Yuxin Hou and Tianyu Cui and Le Chen and Dieter Büchler and Arno Solin and Juho Kannala and Joni Pajarinen },
  journal={arXiv preprint arXiv:2502.19544},
  year={ 2025 }
}

Comments on this paper