Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.03715
Cited By
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
4 April 2024
Corby Rosset
Ching-An Cheng
Arindam Mitra
Michael Santacroce
Ahmed Hassan Awadallah
Tengyang Xie
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences"
7 / 7 papers shown
Title
LiPO: Listwise Preference Optimization through Learning-to-Rank
Tianqi Liu
Zhen Qin
Junru Wu
Jiaming Shen
Misha Khalman
...
Mohammad Saleh
Simon Baumgartner
Jialu Liu
Peter J. Liu
Xuanhui Wang
101
37
0
28 Jan 2025
Reward-Augmented Data Enhances Direct Preference Alignment of LLMs
Shenao Zhang
Zhihan Liu
Boyi Liu
Y. Zhang
Yingxiang Yang
Y. Liu
Liyu Chen
Tao Sun
Z. Wang
64
2
0
10 Oct 2024
KTO: Model Alignment as Prospect Theoretic Optimization
Kawin Ethayarajh
Winnie Xu
Niklas Muennighoff
Dan Jurafsky
Douwe Kiela
139
147
0
02 Feb 2024
Self-Rewarding Language Models
Weizhe Yuan
Richard Yuanzhe Pang
Kyunghyun Cho
Xian Li
Sainbayar Sukhbaatar
Jing Xu
Jason Weston
ReLM
SyDa
ALM
LRM
190
133
0
18 Jan 2024
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
Avi Singh
John D. Co-Reyes
Rishabh Agarwal
Ankesh Anand
Piyush Patil
...
Yamini Bansal
Ethan Dyer
Behnam Neyshabur
Jascha Narain Sohl-Dickstein
Noah Fiedel
ALM
LRM
ReLM
SyDa
127
66
0
11 Dec 2023
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
267
2,029
0
21 Mar 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
298
8,441
0
04 Mar 2022
1