Title |
---|
![]() Human Alignment of Large Language Models through Online Preference
Optimisation Daniele Calandriello Daniel Guo Rémi Munos Mark Rowland Yunhao Tang ...Michal Valko Tianqi Liu Rishabh Joshi Zeyu Zheng Bilal Piot |
![]() Nash Learning from Human Feedback Rémi Munos Michal Valko Daniele Calandriello M. G. Azar Mark Rowland ...Nikola Momchev Olivier Bachem D. Mankowitz Doina Precup Bilal Piot |
![]() Calibrating Likelihoods towards Consistency in Summarization Models Polina Zablotskaia Misha Khalman Rishabh Joshi Livio Baldini Soares Shoshana Jakobovits Joshua Maynez Shashi Narayan |