Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2503.14286
Cited By
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs
18 March 2025
Nicolas Le Roux
Marc G. Bellemare
Jonathan Lebensold
Arnaud Bergeron
Joshua Greaves
Alex Fréchette
Carolyne Pelletier
Eric Thibodeau-Laufer
Sándor Toth
Sam Work
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs"
Title
No papers