ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.13639
  4. Cited By
Contrastive Preference Learning: Learning from Human Feedback without RL
v1v2v3 (latest)

Contrastive Preference Learning: Learning from Human Feedback without RL

20 October 2023
Joey Hejna
Rafael Rafailov
Harshit S. Sikchi
Chelsea Finn
S. Niekum
W. B. Knox
Dorsa Sadigh
    OffRL
ArXiv (abs)PDFHTMLHuggingFace (25 upvotes)Github (180★)

Papers citing "Contrastive Preference Learning: Learning from Human Feedback without RL"

50 / 56 papers shown
Humanline: Online Alignment as Perceptual Loss
Humanline: Online Alignment as Perceptual Loss
Sijia Liu
Niklas Muennighoff
Kawin Ethayarajh
OnRL
126
0
0
30 Mar 2026
Mitigating Length Bias in RLHF through a Causal Lens
Mitigating Length Bias in RLHF through a Causal Lens
Hyeonji Kim
Sujeong Oh
Sanghack Lee
183
1
0
16 Nov 2025
Beyond Single-Reward: Multi-Pair, Multi-Perspective Preference Optimization for Machine Translation
Beyond Single-Reward: Multi-Pair, Multi-Perspective Preference Optimization for Machine Translation
Hao Wang
Linlong Xu
Heng Liu
Y. Liu
Xiaohu Zhao
Bo Zeng
Liangying Shao
Longyue Wang
Weihua Luo
Kaifu Zhang
164
1
0
15 Oct 2025
Predictive Preference Learning from Human Interventions
Predictive Preference Learning from Human Interventions
Haoyuan Cai
Zhenghao Peng
Bolei Zhou
188
2
0
02 Oct 2025
How Well Can Preference Optimization Generalize Under Noisy Feedback?
How Well Can Preference Optimization Generalize Under Noisy Feedback?
Shawn Im
Yixuan Li
286
2
0
01 Oct 2025
Preference-Guided Learning for Sparse-Reward Multi-Agent Reinforcement Learning
Preference-Guided Learning for Sparse-Reward Multi-Agent Reinforcement Learning
Viet The Bui
Tien Mai
Hong Thanh Nguyen
OffRL
283
0
0
26 Sep 2025
Collaborate, Deliberate, Evaluate: How LLM Alignment Affects Coordinated Multi-Agent Outcomes
Collaborate, Deliberate, Evaluate: How LLM Alignment Affects Coordinated Multi-Agent Outcomes
Abhijnan Nath
Carine Graff
Nikhil Krishnaswamy
LLMAG
313
3
0
07 Sep 2025
Policy Learning from Large Vision-Language Model Feedback without Reward Modeling
Policy Learning from Large Vision-Language Model Feedback without Reward Modeling
Tung M. Luu
Donghoon Lee
Younghwan Lee
Chang D. Yoo
OffRL
290
3
0
31 Jul 2025
PITA: Preference-Guided Inference-Time Alignment for LLM Post-Training
PITA: Preference-Guided Inference-Time Alignment for LLM Post-Training
Sarat Chandra Bobbili
Ujwal Dinesha
Dheeraj Narasimha
S. Shakkottai
289
3
0
26 Jul 2025
Robot-Gated Interactive Imitation Learning with Adaptive Intervention Mechanism
Robot-Gated Interactive Imitation Learning with Adaptive Intervention Mechanism
Haoyuan Cai
Zhenghao Peng
Bolei Zhou
267
2
0
10 Jun 2025
MisoDICE: Multi-Agent Imitation from Unlabeled Mixed-Quality Demonstrations
MisoDICE: Multi-Agent Imitation from Unlabeled Mixed-Quality Demonstrations
Viet The Bui
Tien Mai
Hong Thanh Nguyen
OffRL
266
2
0
24 May 2025
Policy-labeled Preference Learning: Is Preference Enough for RLHF?
Policy-labeled Preference Learning: Is Preference Enough for RLHF?
Taehyun Cho
Seokhun Ju
Seungyub Han
Dohyeong Kim
Kyungjae Lee
Jungwoo Lee
OffRL
464
0
0
06 May 2025
Optimal Interactive Learning on the Job via Facility Location Planning
Optimal Interactive Learning on the Job via Facility Location Planning
Shivam Vats
Michelle Zhao
Patrick Callaghan
Mingxi Jia
Maxim Likhachev
Oliver Kroemer
George Konidaris
654
1
0
01 May 2025
Direct Post-Training Preference Alignment for Multi-Agent Motion Generation Models Using Implicit Feedback from Pre-training Demonstrations
Direct Post-Training Preference Alignment for Multi-Agent Motion Generation Models Using Implicit Feedback from Pre-training DemonstrationsInternational Conference on Learning Representations (ICLR), 2025
Ran Tian
Kratarth Goel
298
5
0
25 Mar 2025
One Framework to Rule Them All: Unifying RL-Based and RL-Free Methods in RLHF
One Framework to Rule Them All: Unifying RL-Based and RL-Free Methods in RLHF
Xin Cai
330
3
0
25 Mar 2025
Disentangling Uncertainties by Learning Compressed Data Representation
Disentangling Uncertainties by Learning Compressed Data RepresentationConference on Learning for Dynamics & Control (L4DC), 2025
Zhiyu An
Zhibo Hou
Wan Du
UQCVUD
418
1
0
20 Mar 2025
Inverse-RLignment: Large Language Model Alignment from Demonstrations through Inverse Reinforcement Learning
Inverse-RLignment: Large Language Model Alignment from Demonstrations through Inverse Reinforcement Learning
Hao Sun
M. Schaar
498
25
0
28 Jan 2025
Direct Preference Optimization for Primitive-Enabled Hierarchical Reinforcement Learning
Direct Preference Optimization for Primitive-Enabled Hierarchical Reinforcement Learning
Utsav Singh
Souradip Chakraborty
Wesley A Suttle
Brian M. Sadler
Derrik E. Asher
Anit Kumar Sahu
Mubarak Shah
Vinay P. Namboodiri
Amrit Singh Bedi
430
1
0
01 Nov 2024
Scalable Reinforcement Post-Training Beyond Static Human Prompts: Evolving Alignment via Asymmetric Self-Play
Scalable Reinforcement Post-Training Beyond Static Human Prompts: Evolving Alignment via Asymmetric Self-Play
Ziyu Ye
Rishabh Agarwal
Tianqi Liu
Rishabh Joshi
Sarmishta Velury
Quoc Le
Qijun Tan
Yating Liu
375
5
0
31 Oct 2024
Understanding Layer Significance in LLM Alignment
Understanding Layer Significance in LLM Alignment
Guangyuan Shi
Zexin Lu
Xiaoyu Dong
Wenlong Zhang
Xuanyu Zhang
Yujie Feng
Xiao-Ming Wu
589
14
0
23 Oct 2024
TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling
TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling
Jiahao Qiu
Yifu Lu
Yifan Zeng
Jiacheng Guo
Jiayi Geng
...
Ling Yang
Mengdi Wang
Kaixuan Huang
Yue Wu
Mengdi Wang
600
57
0
18 Oct 2024
DeformPAM: Data-Efficient Learning for Long-horizon Deformable Object Manipulation via Preference-based Action Alignment
DeformPAM: Data-Efficient Learning for Long-horizon Deformable Object Manipulation via Preference-based Action AlignmentIEEE International Conference on Robotics and Automation (ICRA), 2024
Wendi Chen
Han Xue
Fangyuan Zhou
Yuan Fang
Cewu Lu
399
5
0
15 Oct 2024
X-ALMA: Plug & Play Modules and Adaptive Rejection for Quality Translation at Scale
X-ALMA: Plug & Play Modules and Adaptive Rejection for Quality Translation at ScaleInternational Conference on Learning Representations (ICLR), 2024
Haoran Xu
Kenton W. Murray
Philipp Koehn
Hieu T. Hoang
Akiko Eriguchi
Huda Khayrallah
415
37
0
04 Oct 2024
LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits
LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits
Duy Nguyen
Archiki Prasad
Elias Stengel-Eskin
Joey Tianyi Zhou
588
6
0
02 Oct 2024
Forward KL Regularized Preference Optimization for Aligning Diffusion
  Policies
Forward KL Regularized Preference Optimization for Aligning Diffusion PoliciesAAAI Conference on Artificial Intelligence (AAAI), 2024
Zhao Shan
Chenyou Fan
Delin Qu
Jiyuan Shi
Chenjia Bai
375
8
0
09 Sep 2024
Towards a Unified View of Preference Learning for Large Language Models:
  A Survey
Towards a Unified View of Preference Learning for Large Language Models: A Survey
Bofei Gao
Feifan Song
Yibo Miao
Zefan Cai
Zhiyong Yang
...
Houfeng Wang
Zhifang Sui
Peiyi Wang
Baobao Chang
Baobao Chang
528
19
0
04 Sep 2024
Listwise Reward Estimation for Offline Preference-based Reinforcement
  Learning
Listwise Reward Estimation for Offline Preference-based Reinforcement LearningInternational Conference on Machine Learning (ICML), 2024
Heewoong Choi
Sangwon Jung
Hongjoon Ahn
Taesup Moon
OffRL
313
14
0
08 Aug 2024
Can DPO Learn Diverse Human Values? A Theoretical Scaling Law
Can DPO Learn Diverse Human Values? A Theoretical Scaling Law
Shawn Im
Yixuan Li
692
3
0
06 Aug 2024
AI Safety in Generative AI Large Language Models: A Survey
AI Safety in Generative AI Large Language Models: A Survey
Jaymari Chua
Yun Yvonna Li
Shiyi Yang
Chen Wang
Lina Yao
LM&MA
426
40
0
06 Jul 2024
Safe MPC Alignment with Human Directional Feedback
Safe MPC Alignment with Human Directional Feedback
Zhixian Xie
Wenlong Zhang
Yi Ren
Zhaoran Wang
George J. Pappas
Wanxin Jin
343
3
0
05 Jul 2024
Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for
  Cartoon Captioning
Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon CaptioningNeural Information Processing Systems (NeurIPS), 2024
Jifan Zhang
Lalit P. Jain
Yang Guo
Jiayi Chen
Kuan Lok Zhou
...
Scott Sievert
Timothy T. Rogers
Kevin Jamieson
Robert Mankoff
Robert Nowak
346
12
0
15 Jun 2024
ContraSolver: Self-Alignment of Language Models by Resolving Internal
  Preference Contradictions
ContraSolver: Self-Alignment of Language Models by Resolving Internal Preference Contradictions
Xu Zhang
Xunjian Yin
Xiaojun Wan
226
4
0
13 Jun 2024
Self-Play with Adversarial Critic: Provable and Scalable Offline
  Alignment for Language Models
Self-Play with Adversarial Critic: Provable and Scalable Offline Alignment for Language Models
Xiang Ji
Sanjeev Kulkarni
Mengdi Wang
Tengyang Xie
OffRL
380
12
0
06 Jun 2024
Preference Alignment with Flow Matching
Preference Alignment with Flow Matching
Minu Kim
Yongsik Lee
Sehyeok Kang
Jihwan Oh
Song Chong
Seyoung Yun
272
5
0
30 May 2024
(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts
(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts
Minghao Wu
Jiahao Xu
Yulin Yuan
Gholamreza Haffari
Longyue Wang
Weihua Luo
Kaifu Zhang
LLMAG
747
49
0
20 May 2024
TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction
TRANSIC: Sim-to-Real Policy Transfer by Learning from Online CorrectionConference on Robot Learning (CoRL), 2024
Yunfan Jiang
Chen Wang
Ruohan Zhang
Jiajun Wu
Fei-Fei Li
OnRL
371
64
0
16 May 2024
Robot Air Hockey: A Manipulation Testbed for Robot Learning with
  Reinforcement Learning
Robot Air Hockey: A Manipulation Testbed for Robot Learning with Reinforcement Learning
Caleb Chuck
Carl Qi
M. Munje
Shuozhe Li
Max Rudolph
...
Kavan Mehta
Anthony Wang
Peter Stone
Amy Zhang
S. Niekum
329
6
0
06 May 2024
A Preference-driven Paradigm for Enhanced Translation with Large
  Language Models
A Preference-driven Paradigm for Enhanced Translation with Large Language Models
D. Zhu
Sony Trenous
Xiaoyu Shen
Dietrich Klakow
Bill Byrne
Eva Hasler
279
9
0
17 Apr 2024
Regularized Conditional Diffusion Model for Multi-Task Preference
  Alignment
Regularized Conditional Diffusion Model for Multi-Task Preference Alignment
Xudong Yu
Chenjia Bai
Haoran He
Changhong Wang
Xuelong Li
406
10
0
07 Apr 2024
Heterogeneous Contrastive Learning for Foundation Models and Beyond
Heterogeneous Contrastive Learning for Foundation Models and Beyond
Lecheng Zheng
Baoyu Jing
Zihao Li
Hanghang Tong
Jingrui He
VLM
309
42
0
30 Mar 2024
Understanding the Learning Dynamics of Alignment with Human Feedback
Understanding the Learning Dynamics of Alignment with Human Feedback
Shawn Im
Yixuan Li
ALM
520
18
0
27 Mar 2024
Human Alignment of Large Language Models through Online Preference
  Optimisation
Human Alignment of Large Language Models through Online Preference OptimisationInternational Conference on Machine Learning (ICML), 2024
Daniele Calandriello
Daniel Guo
Rémi Munos
Mark Rowland
Yunhao Tang
...
Michal Valko
Tianqi Liu
Rishabh Joshi
Zeyu Zheng
Bilal Piot
309
88
0
13 Mar 2024
Improving Reinforcement Learning from Human Feedback Using Contrastive
  Rewards
Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards
Wei Shen
Xiaoying Zhang
Yuanshun Yao
Rui Zheng
Hongyi Guo
Yang Liu
ALM
262
26
0
12 Mar 2024
On the Essence and Prospect: An Investigation of Alignment Approaches
  for Big Models
On the Essence and Prospect: An Investigation of Alignment Approaches for Big Models
Xinpeng Wang
Shitong Duan
Xiaoyuan Yi
Jing Yao
Shanlin Zhou
Zhihua Wei
Peng Zhang
Dongkuan Xu
Maosong Sun
Xing Xie
OffRL
433
24
0
07 Mar 2024
Reward Model Learning vs. Direct Policy Optimization: A Comparative
  Analysis of Learning from Human Preferences
Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences
Andi Nika
Debmalya Mandal
Parameswaran Kamalaruban
Georgios Tzannetos
Goran Radanović
Adish Singla
222
21
0
04 Mar 2024
Batch Active Learning of Reward Functions from Human Preferences
Batch Active Learning of Reward Functions from Human Preferences
Erdem Biyik
Nima Anari
Dorsa Sadigh
403
15
0
24 Feb 2024
A Dense Reward View on Aligning Text-to-Image Diffusion with Preference
A Dense Reward View on Aligning Text-to-Image Diffusion with Preference
Shentao Yang
Tianqi Chen
Mingyuan Zhou
EGVM
478
49
0
13 Feb 2024
"Task Success" is not Enough: Investigating the Use of Video-Language
  Models as Behavior Critics for Catching Undesirable Agent Behaviors
"Task Success" is not Enough: Investigating the Use of Video-Language Models as Behavior Critics for Catching Undesirable Agent Behaviors
L. Guan
Yifan Zhou
Denis Liu
Yantian Zha
H. B. Amor
Subbarao Kambhampati
LM&Ro
353
29
0
06 Feb 2024
YODA: Teacher-Student Progressive Learning for Language Models
YODA: Teacher-Student Progressive Learning for Language Models
Jianqiao Lu
Wanjun Zhong
Yufei Wang
Zhijiang Guo
Qi Zhu
...
Baojun Wang
Yasheng Wang
Lifeng Shang
Xin Jiang
Qun Liu
LRM
288
11
0
28 Jan 2024
Contrastive Preference Optimization: Pushing the Boundaries of LLM
  Performance in Machine Translation
Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine TranslationInternational Conference on Machine Learning (ICML), 2024
Haoran Xu
Amr Sharaf
Yunmo Chen
Weiting Tan
Lingfeng Shen
Benjamin Van Durme
Kenton W. Murray
Young Jin Kim
ALM
582
430
0
16 Jan 2024
12
Next
Page 1 of 2