An Empirical Study on the Effectiveness of Incorporating Offline RL As Online RL Subroutines

29 November 2025

Jianhai Su

Jinzhu Luo

Qi Zhang

OffRL

OnRL

ArXiv (abs)PDF HTML Github (50★)

Main:7 Pages

8 Figures

Bibliography:2 Pages

17 Tables

Appendix:12 Pages

Abstract

We take the novel perspective of incorporating offline RL algorithms as subroutines of tabula rasa online RL. This is feasible because an online learning agent can repurpose its historical interactions as offline dataset. We formalize this idea into a framework that accommodates several variants of offline RL incorporation such as final policy recommendation and online fine-tuning. We further introduce convenient techniques to improve its effectiveness in enhancing online learning efficiency. Our extensive and systematic empirical analyses show that 1) the effectiveness of the proposed framework depends strongly on the nature of the task, 2) our proposed techniques greatly enhance its effectiveness, and 3) existing online fine-tuning methods are overall ineffective, calling for more research therein.

View on arXiv

Comments on this paper