Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.17312
Cited By
Not All Preference Pairs Are Created Equal: A Recipe for Annotation-Efficient Iterative Preference Learning
25 June 2024
Sen Yang
Leyang Cui
Deng Cai
Xinting Huang
Shuming Shi
Wai Lam
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Not All Preference Pairs Are Created Equal: A Recipe for Annotation-Efficient Iterative Preference Learning"
4 / 4 papers shown
Title
Bootstrapping Language Models with DPO Implicit Rewards
Changyu Chen
Zichen Liu
Chao Du
Tianyu Pang
Qian Liu
Arunesh Sinha
Pradeep Varakantham
Min-Bin Lin
SyDa
ALM
60
22
0
14 Jun 2024
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Corby Rosset
Ching-An Cheng
Arindam Mitra
Michael Santacroce
Ahmed Hassan Awadallah
Tengyang Xie
144
113
0
04 Apr 2024
Online Iterative Reinforcement Learning from Human Feedback with General Preference Model
Chen Ye
Wei Xiong
Yuheng Zhang
Nan Jiang
Tong Zhang
OffRL
31
9
0
11 Feb 2024
Understanding Dataset Difficulty with
V
\mathcal{V}
V
-Usable Information
Kawin Ethayarajh
Yejin Choi
Swabha Swayamdipta
154
157
0
16 Oct 2021
1