Title |
---|
![]() Offline Regularised Reinforcement Learning for Large Language Models
Alignment Pierre Harvey Richemond Yunhao Tang Daniel Guo Daniele Calandriello M. G. Azar ...Gil Shamir Rishabh Joshi Tianqi Liu Rémi Munos Bilal Piot |
![]() Human Alignment of Large Language Models through Online Preference
Optimisation Daniele Calandriello Daniel Guo Rémi Munos Mark Rowland Yunhao Tang ...Michal Valko Tianqi Liu Rishabh Joshi Zeyu Zheng Bilal Piot |
![]() Mastering Stacking of Diverse Shapes with Large-Scale Iterative
Reinforcement Learning on Real Robots Thomas Lampe A. Abdolmaleki Sarah Bechtle Sandy H. Huang Jost Tobias Springenberg ...Markus Wulfmeier Jingwei Zhang Francesco Nori N. Heess Martin Riedmiller |