Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.10571
Cited By
Direct Preference Optimization with an Offset
16 February 2024
Afra Amini
Tim Vieira
Ryan Cotterell
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Direct Preference Optimization with an Offset"
4 / 4 papers shown
Title
Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization
Kesen Zhao
B. Zhu
Qianru Sun
Hanwang Zhang
MLLM
LRM
65
0
0
25 Apr 2025
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
209
413
0
28 Sep 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
284
8,441
0
04 Mar 2022
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
256
1,151
0
18 Sep 2019
1