Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2310.13639
Cited By

Contrastive Preference Learning: Learning from Human Feedback without RL

v1v2v3 (latest)

Contrastive Preference Learning: Learning from Human Feedback without RL

20 October 2023

Rafael Rafailov

Harshit S. Sikchi

Dorsa Sadigh

ArXiv (abs)PDF HTML HuggingFace (25 upvotes)Github (180★)

Papers citing "Contrastive Preference Learning: Learning from Human Feedback without RL"

50 / 56 papers shown

Humanline: Online Alignment as Perceptual Loss

Humanline: Online Alignment as Perceptual Loss

Niklas Muennighoff

Kawin Ethayarajh

126

0

0

30 Mar 2026

Mitigating Length Bias in RLHF through a Causal Lens

Mitigating Length Bias in RLHF through a Causal Lens

183

1

0

16 Nov 2025

Beyond Single-Reward: Multi-Pair, Multi-Perspective Preference Optimization for Machine Translation

Beyond Single-Reward: Multi-Pair, Multi-Perspective Preference Optimization for Machine Translation

164

1

0

15 Oct 2025

Predictive Preference Learning from Human Interventions

Predictive Preference Learning from Human Interventions

188

2

0

02 Oct 2025

How Well Can Preference Optimization Generalize Under Noisy Feedback?

How Well Can Preference Optimization Generalize Under Noisy Feedback?

286

2

0

01 Oct 2025

Preference-Guided Learning for Sparse-Reward Multi-Agent Reinforcement Learning

Preference-Guided Learning for Sparse-Reward Multi-Agent Reinforcement Learning

Hong Thanh Nguyen

283

0

0

26 Sep 2025

Collaborate, Deliberate, Evaluate: How LLM Alignment Affects Coordinated Multi-Agent Outcomes

Collaborate, Deliberate, Evaluate: How LLM Alignment Affects Coordinated Multi-Agent Outcomes

Nikhil Krishnaswamy

313

3

0

07 Sep 2025

Policy Learning from Large Vision-Language Model Feedback without Reward Modeling

Policy Learning from Large Vision-Language Model Feedback without Reward Modeling

290

3

0

31 Jul 2025

PITA: Preference-Guided Inference-Time Alignment for LLM Post-Training

PITA: Preference-Guided Inference-Time Alignment for LLM Post-Training

Sarat Chandra Bobbili

Dheeraj Narasimha

289

3

0

26 Jul 2025

Robot-Gated Interactive Imitation Learning with Adaptive Intervention Mechanism

Robot-Gated Interactive Imitation Learning with Adaptive Intervention Mechanism

267

2

0

10 Jun 2025

MisoDICE: Multi-Agent Imitation from Unlabeled Mixed-Quality Demonstrations

MisoDICE: Multi-Agent Imitation from Unlabeled Mixed-Quality Demonstrations

Hong Thanh Nguyen

266

2

0

24 May 2025

Policy-labeled Preference Learning: Is Preference Enough for RLHF?

Policy-labeled Preference Learning: Is Preference Enough for RLHF?

464

0

0

06 May 2025

Optimal Interactive Learning on the Job via Facility Location Planning

Optimal Interactive Learning on the Job via Facility Location Planning

Patrick Callaghan

Maxim Likhachev

George Konidaris

654

1

0

01 May 2025

Direct Post-Training Preference Alignment for Multi-Agent Motion Generation Models Using Implicit Feedback from Pre-training Demonstrations

Direct Post-Training Preference Alignment for Multi-Agent Motion Generation Models Using Implicit Feedback from Pre-training DemonstrationsInternational Conference on Learning Representations (ICLR), 2025

298

5

0

25 Mar 2025

One Framework to Rule Them All: Unifying RL-Based and RL-Free Methods in RLHF

One Framework to Rule Them All: Unifying RL-Based and RL-Free Methods in RLHF

330

3

0

25 Mar 2025

Disentangling Uncertainties by Learning Compressed Data Representation

Disentangling Uncertainties by Learning Compressed Data RepresentationConference on Learning for Dynamics & Control (L4DC), 2025

418

1

0

20 Mar 2025

Inverse-RLignment: Large Language Model Alignment from Demonstrations through Inverse Reinforcement Learning

Inverse-RLignment: Large Language Model Alignment from Demonstrations through Inverse Reinforcement Learning

498

25

0

28 Jan 2025

Direct Preference Optimization for Primitive-Enabled Hierarchical Reinforcement Learning

Direct Preference Optimization for Primitive-Enabled Hierarchical Reinforcement Learning

Souradip Chakraborty

Wesley A Suttle

Brian M. Sadler

Derrik E. Asher

Anit Kumar Sahu

Vinay P. Namboodiri

Amrit Singh Bedi

430

1

0

01 Nov 2024

Scalable Reinforcement Post-Training Beyond Static Human Prompts: Evolving Alignment via Asymmetric Self-Play

Scalable Reinforcement Post-Training Beyond Static Human Prompts: Evolving Alignment via Asymmetric Self-Play

Rishabh Agarwal

Sarmishta Velury

375

5

0

31 Oct 2024

Understanding Layer Significance in LLM Alignment

Understanding Layer Significance in LLM Alignment

589

14

0

23 Oct 2024

TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling

TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling

...

600

57

0

18 Oct 2024

DeformPAM: Data-Efficient Learning for Long-horizon Deformable Object Manipulation via Preference-based Action Alignment

DeformPAM: Data-Efficient Learning for Long-horizon Deformable Object Manipulation via Preference-based Action AlignmentIEEE International Conference on Robotics and Automation (ICRA), 2024

Cewu Lu

399

5

0

15 Oct 2024

X-ALMA: Plug & Play Modules and Adaptive Rejection for Quality Translation at Scale

X-ALMA: Plug & Play Modules and Adaptive Rejection for Quality Translation at ScaleInternational Conference on Learning Representations (ICLR), 2024

Kenton W. Murray

Philipp Koehn

Huda Khayrallah

415

37

0

04 Oct 2024

LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits

LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits

Elias Stengel-Eskin

Joey Tianyi Zhou

588

6

0

02 Oct 2024

Forward KL Regularized Preference Optimization for Aligning Diffusion
Policies

Forward KL Regularized Preference Optimization for Aligning Diffusion PoliciesAAAI Conference on Artificial Intelligence (AAAI), 2024

375

8

0

09 Sep 2024

Towards a Unified View of Preference Learning for Large Language Models:
A Survey

Towards a Unified View of Preference Learning for Large Language Models: A Survey

...

Houfeng Wang

Zhifang Sui

Peiyi Wang

Baobao Chang

528

19

0

04 Sep 2024

Listwise Reward Estimation for Offline Preference-based Reinforcement
Learning

Listwise Reward Estimation for Offline Preference-based Reinforcement LearningInternational Conference on Machine Learning (ICML), 2024

Sangwon Jung

313

14

0

08 Aug 2024

Can DPO Learn Diverse Human Values? A Theoretical Scaling Law

Can DPO Learn Diverse Human Values? A Theoretical Scaling Law

692

3

0

06 Aug 2024

AI Safety in Generative AI Large Language Models: A Survey

AI Safety in Generative AI Large Language Models: A Survey

Lina Yao

426

40

0

06 Jul 2024

Safe MPC Alignment with Human Directional Feedback

Safe MPC Alignment with Human Directional Feedback

Wenlong Zhang

Yi Ren

Zhaoran Wang

George J. Pappas

343

3

0

05 Jul 2024

Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for
Cartoon Captioning

Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon CaptioningNeural Information Processing Systems (NeurIPS), 2024

...

Timothy T. Rogers

Kevin Jamieson

Robert Nowak

346

12

0

15 Jun 2024

ContraSolver: Self-Alignment of Language Models by Resolving Internal
Preference Contradictions

ContraSolver: Self-Alignment of Language Models by Resolving Internal Preference Contradictions

Xiaojun Wan

226

4

0

13 Jun 2024

Self-Play with Adversarial Critic: Provable and Scalable Offline
Alignment for Language Models

Self-Play with Adversarial Critic: Provable and Scalable Offline Alignment for Language Models

Xiang Ji

Sanjeev Kulkarni

380

12

0

06 Jun 2024

Preference Alignment with Flow Matching

Preference Alignment with Flow Matching

272

5

0

30 May 2024

(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts

(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts

Gholamreza Haffari

747

49

0

20 May 2024

TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction

TRANSIC: Sim-to-Real Policy Transfer by Learning from Online CorrectionConference on Robot Learning (CoRL), 2024

Ruohan Zhang

Jiajun Wu

371

64

0

16 May 2024

Robot Air Hockey: A Manipulation Testbed for Robot Learning with
Reinforcement Learning

Robot Air Hockey: A Manipulation Testbed for Robot Learning with Reinforcement Learning

...

329

6

0

06 May 2024

A Preference-driven Paradigm for Enhanced Translation with Large
Language Models

A Preference-driven Paradigm for Enhanced Translation with Large Language Models

Xiaoyu Shen

Dietrich Klakow

279

9

0

17 Apr 2024

Regularized Conditional Diffusion Model for Multi-Task Preference
Alignment

Regularized Conditional Diffusion Model for Multi-Task Preference Alignment

Xuelong Li

406

10

0

07 Apr 2024

Heterogeneous Contrastive Learning for Foundation Models and Beyond

Heterogeneous Contrastive Learning for Foundation Models and Beyond

309

42

0

30 Mar 2024

Understanding the Learning Dynamics of Alignment with Human Feedback

Understanding the Learning Dynamics of Alignment with Human Feedback

520

18

0

27 Mar 2024

Human Alignment of Large Language Models through Online Preference
Optimisation

Human Alignment of Large Language Models through Online Preference OptimisationInternational Conference on Machine Learning (ICML), 2024

Daniele Calandriello

Mark Rowland

...

Rishabh Joshi

Bilal Piot

309

88

0

13 Mar 2024

Improving Reinforcement Learning from Human Feedback Using Contrastive
Rewards

Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards

Wei Shen

Yang Liu

262

26

0

12 Mar 2024

On the Essence and Prospect: An Investigation of Alignment Approaches
for Big Models

On the Essence and Prospect: An Investigation of Alignment Approaches for Big Models

Maosong Sun

Xing Xie

433

24

0

07 Mar 2024

Reward Model Learning vs. Direct Policy Optimization: A Comparative
Analysis of Learning from Human Preferences

Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences

Debmalya Mandal

Parameswaran Kamalaruban

Georgios Tzannetos

Goran Radanović

222

21

0

04 Mar 2024

Batch Active Learning of Reward Functions from Human Preferences

Batch Active Learning of Reward Functions from Human Preferences

Dorsa Sadigh

403

15

0

24 Feb 2024

A Dense Reward View on Aligning Text-to-Image Diffusion with Preference

A Dense Reward View on Aligning Text-to-Image Diffusion with Preference

Mingyuan Zhou

478

49

0

13 Feb 2024

"Task Success" is not Enough: Investigating the Use of Video-Language
Models as Behavior Critics for Catching Undesirable Agent Behaviors

"Task Success" is not Enough: Investigating the Use of Video-Language Models as Behavior Critics for Catching Undesirable Agent Behaviors

Subbarao Kambhampati

353

29

0

06 Feb 2024

YODA: Teacher-Student Progressive Learning for Language Models

YODA: Teacher-Student Progressive Learning for Language Models

...

Lifeng Shang

Xin Jiang

Qun Liu

288

11

0

28 Jan 2024

Contrastive Preference Optimization: Pushing the Boundaries of LLM
Performance in Machine Translation

Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine TranslationInternational Conference on Machine Learning (ICML), 2024

Lingfeng Shen

Benjamin Van Durme

Kenton W. Murray

Young Jin Kim

582

430

0

16 Jan 2024

Page 1 of 2