Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.13709
Cited By
Understanding Reference Policies in Direct Preference Optimization
18 July 2024
Yixin Liu
Pengfei Liu
Arman Cohan
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Understanding Reference Policies in Direct Preference Optimization"
5 / 5 papers shown
Title
Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model
Junshu Pan
Wei Shen
Shulin Huang
Qiji Zhou
Yue Zhang
69
0
0
22 Apr 2025
SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters
Teng Xiao
Yige Yuan
Z. Chen
Mingxiao Li
Shangsong Liang
Z. Ren
V. Honavar
84
5
0
21 Feb 2025
Preference Learning Algorithms Do Not Learn Preference Rankings
Angelica Chen
Sadhika Malladi
Lily H. Zhang
Xinyi Chen
Qiuyi Zhang
Rajesh Ranganath
Kyunghyun Cho
20
18
0
29 May 2024
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Classical Structured Prediction Losses for Sequence to Sequence Learning
Sergey Edunov
Myle Ott
Michael Auli
David Grangier
MarcÁurelio Ranzato
AIMat
34
185
0
14 Nov 2017
1