Title |
---|
![]() Multi-turn Reinforcement Learning from Preference Human Feedback Lior Shani Aviv Rosenberg Asaf B. Cassel Oran Lang Daniele Calandriello ...Bilal Piot Idan Szpektor Avinatan Hassidim Yossi Matias Rémi Munos |
![]() Human Alignment of Large Language Models through Online Preference
Optimisation Daniele Calandriello Daniel Guo Rémi Munos Mark Rowland Yunhao Tang ...Michal Valko Tianqi Liu Rishabh Joshi Zeyu Zheng Bilal Piot |