Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2410.08067
Cited By
v1
v2
v3
v4
v5 (latest)
Reward-Augmented Data Enhances Direct Preference Alignment of LLMs
10 October 2024
Shenao Zhang
Zhihan Liu
Boyi Liu
Yanzhe Zhang
Yingxiang Yang
Yunxing Liu
Liyu Chen
Tao Sun
Ziyi Wang
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (2 upvotes)
Github
Papers citing
"Reward-Augmented Data Enhances Direct Preference Alignment of LLMs"
5 / 5 papers shown
Alignment through Meta-Weighted Online Sampling: Bridging the Gap between Data Generation and Preference Optimization
Junming Yang
Ning Xu
Biao Liu
Shiqi Qiao
Xin Geng
142
1
0
27 Sep 2025
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
International Conference on Learning Representations (ICLR), 2024
Shicong Cen
Jincheng Mei
Katayoon Goshvadi
Hanjun Dai
Tong Yang
Sherry Yang
Dale Schuurmans
Yuejie Chi
Bo Dai
OffRL
754
65
0
20 Feb 2025
DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs
Zhihan Liu
Shenao Zhang
Yongfei Liu
Boyi Liu
Yingxiang Yang
Zhaoran Wang
490
7
0
20 Nov 2024
Online Bandit Learning with Offline Preference Data for Improved RLHF
Akhil Agnihotri
Rahul Jain
Deepak Ramachandran
Zheng Wen
OffRL
785
4
0
13 Jun 2024
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Yann Dubois
Balázs Galambosi
Abigail Z. Jacobs
Tatsunori Hashimoto
ALM
531
713
0
06 Apr 2024
1
Page 1 of 1