Title |
---|
![]() Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization Yuxin Jiang Bo Huang Yufei Wang Xingshan Zeng Liangyou Li Yasheng Wang Xin Jiang Lifeng Shang Ruiming Tang Wei Wang |
![]() Offline Regularised Reinforcement Learning for Large Language Models
Alignment Pierre Harvey Richemond Yunhao Tang Daniel Guo Daniele Calandriello M. G. Azar ...Gil Shamir Rishabh Joshi Tianqi Liu Rémi Munos Bilal Piot |