All Papers
0 / 0 papers shown
Title |
|---|
Title |
|---|

Title |
|---|
![]() Faster WIND: Accelerating Iterative Best-of- Distillation for LLM AlignmentInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024 |
![]() Unintentional Unalignment: Likelihood Displacement in Direct Preference OptimizationInternational Conference on Learning Representations (ICLR), 2024 |
![]() Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHFInternational Conference on Learning Representations (ICLR), 2024 |
![]() The Central Role of the Loss Function in Reinforcement Learning Kaiwen Wang Nathan Kallus Wen Sun |
![]() Offline Regularised Reinforcement Learning for Large Language Models
Alignment Pierre Harvey Richemond Yunhao Tang Daniel Guo Daniele Calandriello M. G. Azar ...Gil Shamir Rishabh Joshi Tianqi Liu Rémi Munos Bilal Piot |
![]() Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct
Preference OptimizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023 |