Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2503.19612
Cited By
RL-finetuning LLMs from on- and off-policy data with a single algorithm
25 March 2025
Yunhao Tang
Taco Cohen
David W. Zhang
Michal Valko
Rémi Munos
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"RL-finetuning LLMs from on- and off-policy data with a single algorithm"
Title
No papers