Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.00036
Cited By
Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform
29 September 2023
Shengyi Huang
Jiayi Weng
Rujikorn Charakorn
Min-Bin Lin
Zhongwen Xu
Santiago Ontañón
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform"
2 / 2 papers shown
Title
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Michael Noukhovitch
Shengyi Huang
Sophie Xhonneux
Arian Hosseini
Rishabh Agarwal
Aaron C. Courville
OffRL
79
5
0
23 Oct 2024
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
319
11,953
0
04 Mar 2022
1