Title |
---|
![]() Offline Regularised Reinforcement Learning for Large Language Models
Alignment Pierre Harvey Richemond Yunhao Tang Daniel Guo Daniele Calandriello M. G. Azar ...Gil Shamir Rishabh Joshi Tianqi Liu Rémi Munos Bilal Piot |
![]() Human Alignment of Large Language Models through Online Preference
Optimisation Daniele Calandriello Daniel Guo Rémi Munos Mark Rowland Yunhao Tang ...Michal Valko Tianqi Liu Rishabh Joshi Zeyu Zheng Bilal Piot |