ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.18252
  4. Cited By
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models

Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models

23 October 2024
Michael Noukhovitch
Shengyi Huang
Sophie Xhonneux
Arian Hosseini
Rishabh Agarwal
Aaron C. Courville
    OffRL
ArXivPDFHTML

Papers citing "Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models"

3 / 3 papers shown
Title
StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation
StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation
Yinmin Zhong
Zili Zhang
Xiaoniu Song
Hanpeng Hu
Chao Jin
...
Changyi Wan
Hongyu Zhou
Yimin Jiang
Yibo Zhu
Daxin Jiang
OffRL
AI4TS
39
0
0
22 Apr 2025
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs
Nicolas Le Roux
Marc G. Bellemare
Jonathan Lebensold
Arnaud Bergeron
Joshua Greaves
Alex Fréchette
Carolyne Pelletier
Eric Thibodeau-Laufer
Sándor Toth
Sam Work
OffRL
73
1
0
18 Mar 2025
Soft Policy Optimization: Online Off-Policy RL for Sequence Models
Taco Cohen
David W. Zhang
Kunhao Zheng
Yunhao Tang
Rémi Munos
Gabriel Synnaeve
OffRL
60
0
0
07 Mar 2025
1