ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2409.03650
  4. Cited By
On the Limited Generalization Capability of the Implicit Reward Model
  Induced by Direct Preference Optimization

On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization

5 September 2024
Yong Lin
Skyler Seto
Maartje ter Hoeve
Katherine Metcalf
B. Theobald
Xuan Wang
Yizhe Zhang
Chen Huang
Tong Zhang
ArXivPDFHTML

Papers citing "On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization"

4 / 4 papers shown
Title
DeformPAM: Data-Efficient Learning for Long-horizon Deformable Object Manipulation via Preference-based Action Alignment
DeformPAM: Data-Efficient Learning for Long-horizon Deformable Object Manipulation via Preference-based Action Alignment
Wendi Chen
Han Xue
Fangyuan Zhou
Yuan Fang
Cewu Lu
39
0
0
15 Oct 2024
Regularizing Hidden States Enables Learning Generalizable Reward Model
  for LLMs
Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs
Rui Yang
Ruomeng Ding
Yong Lin
Huan Zhang
Tong Zhang
21
42
0
14 Jun 2024
Robust Preference Optimization through Reward Model Distillation
Robust Preference Optimization through Reward Model Distillation
Adam Fisch
Jacob Eisenstein
Vicky Zayats
Alekh Agarwal
Ahmad Beirami
Chirag Nagpal
Peter Shaw
Jonathan Berant
73
21
0
29 May 2024
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,881
0
04 Mar 2022
1