The Surprising Difficulty of Search in Model-Based Reinforcement Learning

29 January 2026

Wei-Di Chang

Mikael Henaff

Brandon Amos

Gregory Dudek

Scott Fujimoto

OOD

OffRL

ArXiv (abs)PDF HTML

Main:8 Pages

9 Figures

Bibliography:4 Pages

13 Tables

Appendix:17 Pages

Abstract

This paper investigates search in model-based reinforcement learning (RL). Conventional wisdom holds that long-term predictions and compounding errors are the primary obstacles for model-based RL. We challenge this view, showing that search is not a plug-and-play replacement for a learned policy. Surprisingly, we find that search can harm performance even when the model is highly accurate. Instead, we show that mitigating distribution shift matters more than improving model or value function accuracy. Building on this insight, we identify key techniques for enabling effective search, achieving state-of-the-art performance across multiple popular benchmark domains.

View on arXiv

Comments on this paper