On the Limited Generalization Capability of the Implicit Reward Model
Induced by Direct Preference Optimization

On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization

5 September 2024

Maartje ter Hoeve

Katherine Metcalf

Tong Zhang

Papers citing "On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization"

4 / 4 papers shown

Title
DeformPAM: Data-Efficient Learning for Long-horizon Deformable Object Manipulation via Preference-based Action Alignment Wendi Chen Han Xue Fangyuan Zhou Yuan Fang Cewu Lu 39 0 0 15 Oct 2024
Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs Rui Yang Ruomeng Ding Yong Lin Huan Zhang Tong Zhang 21 42 0 14 Jun 2024
Robust Preference Optimization through Reward Model Distillation Adam Fisch Jacob Eisenstein Vicky Zayats Alekh Agarwal Ahmad Beirami Chirag Nagpal Peter Shaw Jonathan Berant 73 21 0 29 May 2024
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 303 11,881 0 04 Mar 2022