Not All Preference Pairs Are Created Equal: A Recipe for
Annotation-Efficient Iterative Preference Learning

Not All Preference Pairs Are Created Equal: A Recipe for Annotation-Efficient Iterative Preference Learning

25 June 2024

Sen Yang

Wai Lam

Papers citing "Not All Preference Pairs Are Created Equal: A Recipe for Annotation-Efficient Iterative Preference Learning"

4 / 4 papers shown

Title
Bootstrapping Language Models with DPO Implicit Rewards Changyu Chen Zichen Liu Chao Du Tianyu Pang Qian Liu Arunesh Sinha Pradeep Varakantham Min-Bin Lin SyDa ALM 60 22 0 14 Jun 2024
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences Corby Rosset Ching-An Cheng Arindam Mitra Michael Santacroce Ahmed Hassan Awadallah Tengyang Xie 144 113 0 04 Apr 2024
Online Iterative Reinforcement Learning from Human Feedback with General Preference Model Chen Ye Wei Xiong Yuheng Zhang Nan Jiang Tong Zhang OffRL 31 9 0 11 Feb 2024
$Understanding Dataset Difficulty with $\mathcal{V}$-Usable Information$ Understanding Dataset Difficulty with $\mathcal{V}$ -Usable Information Kawin Ethayarajh Yejin Choi Swabha Swayamdipta 154 157 0 16 Oct 2021