Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2410.08067
Cited By

Reward-Augmented Data Enhances Direct Preference Alignment of LLMs

v1v2v3v4v5 (latest)

Reward-Augmented Data Enhances Direct Preference Alignment of LLMs

10 October 2024

ArXiv (abs)PDF HTML HuggingFace (2 upvotes)Github

Papers citing "Reward-Augmented Data Enhances Direct Preference Alignment of LLMs"

5 / 5 papers shown

Alignment through Meta-Weighted Online Sampling: Bridging the Gap between Data Generation and Preference Optimization

Alignment through Meta-Weighted Online Sampling: Bridging the Gap between Data Generation and Preference Optimization

155

1

0

27 Sep 2025

Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF

Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHFInternational Conference on Learning Representations (ICLR), 2024

Katayoon Goshvadi

Dale Schuurmans

766

65

0

20 Feb 2025

DSTC: Direct Preference Learning with Only Self-Generated Tests and Code
to Improve Code LMs

DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs

494

8

0

20 Nov 2024

Online Bandit Learning with Offline Preference Data for Improved RLHF

Online Bandit Learning with Offline Preference Data for Improved RLHF

Akhil Agnihotri

Deepak Ramachandran

800

4

0

13 Jun 2024

Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators

Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators

Balázs Galambosi

Abigail Z. Jacobs

Tatsunori Hashimoto

539

713

0

06 Apr 2024

Page 1 of 1