Title |
---|
![]() Towards a Unified View of Preference Learning for Large Language Models:
A Survey Bofei Gao Feifan Song Yibo Miao Zefan Cai Z. Yang ...Houfeng Wang Zhifang Sui Peiyi Wang Baobao Chang Baobao Chang |
![]() Offline Regularised Reinforcement Learning for Large Language Models
Alignment Pierre Harvey Richemond Yunhao Tang Daniel Guo Daniele Calandriello M. G. Azar ...Gil Shamir Rishabh Joshi Tianqi Liu Rémi Munos Bilal Piot |