ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.12036
  4. Cited By
A General Theoretical Paradigm to Understand Learning from Human
  Preferences
v1v2 (latest)

A General Theoretical Paradigm to Understand Learning from Human Preferences

International Conference on Artificial Intelligence and Statistics (AISTATS), 2023
18 October 2023
M. G. Azar
Mark Rowland
Bilal Piot
Daniel Guo
Daniele Calandriello
Michal Valko
Rémi Munos
ArXiv (abs)PDFHTMLHuggingFace (16 upvotes)

Papers citing "A General Theoretical Paradigm to Understand Learning from Human Preferences"

50 / 578 papers shown
Title
When Human Preferences Flip: An Instance-Dependent Robust Loss for RLHF
When Human Preferences Flip: An Instance-Dependent Robust Loss for RLHF
Yifan Xu
Xichen Ye
Yifan Chen
Qiaosheng Zhang
28
0
0
30 Nov 2025
Ambiguity Awareness Optimization: Towards Semantic Disambiguation for Direct Preference Optimization
Ambiguity Awareness Optimization: Towards Semantic Disambiguation for Direct Preference Optimization
Jian Li
Shenglin Yin
Yujia Zhang
Alan Zhao
Xi Chen
Xiaohui Zhou
Pengfei Xu
60
0
0
28 Nov 2025
Adversarial Training for Process Reward Models
Adversarial Training for Process Reward Models
Gurusha Juneja
Deepak Nathani
William Yang Wang
LRM
80
0
0
28 Nov 2025
Beyond Reward Margin: Rethinking and Resolving Likelihood Displacement in Diffusion Models via Video Generation
Beyond Reward Margin: Rethinking and Resolving Likelihood Displacement in Diffusion Models via Video Generation
Ruojun Xu
Yu Kai
Xuhua Ren
Jiaxiang Cheng
Bing Ma
Tianxiang Zheng
Qinhlin Lu
EGVM
120
0
0
24 Nov 2025
STORE: Semantic Tokenization, Orthogonal Rotation and Efficient Attention for Scaling Up Ranking Models
STORE: Semantic Tokenization, Orthogonal Rotation and Efficient Attention for Scaling Up Ranking ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Y. Xu
Chaofan Fan
J. Hu
Yu Zhang
Zeng Xiaoyi
J. Zhang
148
1
0
24 Nov 2025
FanarGuard: A Culturally-Aware Moderation Filter for Arabic Language Models
FanarGuard: A Culturally-Aware Moderation Filter for Arabic Language Models
M. Fatehkia
Enes Altinisik
Husrev Taha Sencar
66
0
0
24 Nov 2025
TASO: Jailbreak LLMs via Alternative Template and Suffix Optimization
TASO: Jailbreak LLMs via Alternative Template and Suffix Optimization
Yanting Wang
Runpeng Geng
Jinghui Chen
Minhao Cheng
Jinyuan Jia
194
0
0
23 Nov 2025
Bootstrapping LLMs via Preference-Based Policy Optimization
Bootstrapping LLMs via Preference-Based Policy Optimization
Chen Jia
OffRL
302
0
0
17 Nov 2025
Rethinking Deep Alignment Through The Lens Of Incomplete Learning
Rethinking Deep Alignment Through The Lens Of Incomplete Learning
Thong Bach
D. Nguyen
T. Le
T. Tran
76
0
0
15 Nov 2025
Feedback Descent: Open-Ended Text Optimization via Pairwise Comparison
Feedback Descent: Open-Ended Text Optimization via Pairwise Comparison
Yoonho Lee
Joseph Boen
Chelsea Finn
131
1
0
11 Nov 2025
HLPD: Aligning LLMs to Human Language Preference for Machine-Revised Text Detection
HLPD: Aligning LLMs to Human Language Preference for Machine-Revised Text Detection
Fangqi Dai
Xingjian Jiang
Zizhuang Deng
DeLMO
517
0
0
10 Nov 2025
SPA: Achieving Consensus in LLM Alignment via Self-Priority Optimization
SPA: Achieving Consensus in LLM Alignment via Self-Priority Optimization
Yue Huang
Xiangqi Wang
Xiangliang Zhang
112
0
0
09 Nov 2025
The Realignment Problem: When Right becomes Wrong in LLMs
The Realignment Problem: When Right becomes Wrong in LLMs
Aakash Sen Sharma
Debdeep Sanyal
Vivek Srivastava
Shirish Karande
Murari Mandal
MU
177
0
0
04 Nov 2025
VCORE: Variance-Controlled Optimization-based Reweighting for Chain-of-Thought Supervision
VCORE: Variance-Controlled Optimization-based Reweighting for Chain-of-Thought Supervision
Xuan Gong
Senmiao Wang
Hanbo Huang
Ruoyu Sun
Shiyu Liang
OffRLLRM
98
0
0
31 Oct 2025
Greedy Sampling Is Provably Efficient for RLHF
Greedy Sampling Is Provably Efficient for RLHF
Di Wu
Chengshuai Shi
Jing Yang
Cong Shen
86
0
0
28 Oct 2025
Success and Cost Elicit Convention Formation for Efficient Communication
Success and Cost Elicit Convention Formation for Efficient Communication
Saujas Vaduguru
Yilun Hua
Yoav Artzi
Daniel Fried
OffRL
84
0
0
28 Oct 2025
Lightweight Robust Direct Preference Optimization
Lightweight Robust Direct Preference Optimization
Cheol Woo Kim
Shresth Verma
Mauricio Tec
Milind Tambe
84
0
0
27 Oct 2025
The Best of N Worlds: Aligning Reinforcement Learning with Best-of-N Sampling via max@k Optimisation
The Best of N Worlds: Aligning Reinforcement Learning with Best-of-N Sampling via max@k Optimisation
Farid Bagirov
Mikhail Arkhipov
Ksenia Sycheva
Evgeniy Glukhov
Egor Bogomolov
91
0
0
27 Oct 2025
Aligning Diffusion Language Models via Unpaired Preference Optimization
Aligning Diffusion Language Models via Unpaired Preference Optimization
Vaibhav Jindal
Hejian Sang
Chun-Mao Lai
Yanning Chen
Zhipeng Wang
140
0
0
26 Oct 2025
Learning "Partner-Aware" Collaborators in Multi-Party Collaboration
Learning "Partner-Aware" Collaborators in Multi-Party Collaboration
Abhijnan Nath
Nikhil Krishnaswamy
114
0
0
26 Oct 2025
Beyond Pairwise: Empowering LLM Alignment With Ranked Choice Modeling
Beyond Pairwise: Empowering LLM Alignment With Ranked Choice Modeling
Yuxuan Tang
Yifan Feng
96
0
0
24 Oct 2025
Reducing the Probability of Undesirable Outputs in Language Models Using Probabilistic Inference
Reducing the Probability of Undesirable Outputs in Language Models Using Probabilistic Inference
S. Zhao
Aidan Li
Rob Brekelmans
Roger C. Grosse
60
0
0
24 Oct 2025
Robust Preference Alignment via Directional Neighborhood Consensus
Robust Preference Alignment via Directional Neighborhood Consensus
Ruochen Mao
Yuling Shi
Xiaodong Gu
Jiaheng Wei
143
0
0
23 Oct 2025
Why DPO is a Misspecified Estimator and How to Fix It
Why DPO is a Misspecified Estimator and How to Fix It
Aditya Gopalan
Sayak Ray Chowdhury
Debangshu Banerjee
120
0
0
23 Oct 2025
KL-Regularized Reinforcement Learning is Designed to Mode Collapse
KL-Regularized Reinforcement Learning is Designed to Mode Collapse
Anthony GX-Chen
Jatin Prakash
Jeff Guo
Rob Fergus
Rajesh Ranganath
109
1
0
23 Oct 2025
g-DPO: Scalable Preference Optimization for Protein Language Models
g-DPO: Scalable Preference Optimization for Protein Language Models
Constance Ferragu
Jonathan D. Ziegler
Nicolas Deutschmann
Arthur Lindoulsi
Eli Bixby
Cradle ML Team
137
0
0
22 Oct 2025
ADPO: Anchored Direct Preference Optimization
ADPO: Anchored Direct Preference Optimization
Wang Zixian
151
0
0
21 Oct 2025
Eliciting Truthful Feedback for Preference-Based Learning via the VCG Mechanism
Eliciting Truthful Feedback for Preference-Based Learning via the VCG Mechanism
Leo Landolt
Anna M. Maddux
Andreas Schlaginhaufen
Saurabh Vaishampayan
Maryam Kamgarpour
121
0
0
20 Oct 2025
RL makes MLLMs see better than SFT
RL makes MLLMs see better than SFT
Junha Song
Sangdoo Yun
Dongyoon Han
Jaegul Choo
Byeongho Heo
OffRL
171
0
0
18 Oct 2025
Beyond Correctness: Rewarding Faithful Reasoning in Retrieval-Augmented Generation
Beyond Correctness: Rewarding Faithful Reasoning in Retrieval-Augmented Generation
Zhichao Xu
Zongyu Wu
Yun Zhou
Aosong Feng
Kang Zhou
...
Yijun Tian
Xuan Qi
Weikang Qiu
Lin Lee Cheong
Haibo Ding
OffRLRALMLRM
100
0
0
15 Oct 2025
Putting on the Thinking Hats: A Survey on Chain of Thought Fine-tuning from the Perspective of Human Reasoning Mechanism
Putting on the Thinking Hats: A Survey on Chain of Thought Fine-tuning from the Perspective of Human Reasoning Mechanism
Xiaoshu Chen
Sihang Zhou
Ke Liang
Duanyang Yuan
Haoyuan Chen
Xiaoyu Sun
Linyuan Meng
Xinwang Liu
ReLMLRM
217
0
0
15 Oct 2025
Information-Theoretic Reward Modeling for Stable RLHF: Detecting and Mitigating Reward Hacking
Information-Theoretic Reward Modeling for Stable RLHF: Detecting and Mitigating Reward Hacking
Yuchun Miao
Liang Ding
Sen Zhang
Rong Bao
L. Zhang
Dacheng Tao
168
0
0
15 Oct 2025
Towards Understanding Valuable Preference Data for Large Language Model Alignment
Towards Understanding Valuable Preference Data for Large Language Model Alignment
Zizhuo Zhang
Qizhou Wang
Shanshan Ye
Jianing Zhu
Jiangchao Yao
Bo Han
Masashi Sugiyama
96
0
0
15 Oct 2025
On the Role of Preference Variance in Preference Optimization
On the Role of Preference Variance in Preference Optimization
Jiacheng Guo
Zihao Li
Jiahao Qiu
Yue Wu
Mengdi Wang
136
2
0
14 Oct 2025
Reinforced Preference Optimization for Recommendation
Reinforced Preference Optimization for Recommendation
Junfei Tan
Yuxin Chen
An Zhang
Junguang Jiang
Yinan Han
Ziru Xu
Han Zhu
Jian Xu
Bo Zheng
Xiang-Bin Wang
OffRL
174
1
0
14 Oct 2025
DICE: Structured Reasoning in LLMs through SLM-Guided Chain-of-Thought Correction
DICE: Structured Reasoning in LLMs through SLM-Guided Chain-of-Thought Correction
Y. Li
Yusheng Liao
Zhe Chen
Yanfeng Wang
Yu Wang
LRM
152
0
0
10 Oct 2025
Mix- and MoE-DPO: A Variational Inference Approach to Direct Preference Optimization
Mix- and MoE-DPO: A Variational Inference Approach to Direct Preference Optimization
Jason Bohne
Pawel Polak
David S. Rosenberg
Brian Bloniarz
Gary Kazantsev
118
0
0
09 Oct 2025
PIKA: Expert-Level Synthetic Datasets for Post-Training Alignment from Scratch
PIKA: Expert-Level Synthetic Datasets for Post-Training Alignment from Scratch
Shangjian Yin
Shining Liang
Wenbiao Ding
Yuli Qian
Zhouxing Shi
Hongzhi Li
Yutao Xie
ALM
106
0
0
08 Oct 2025
Provably Mitigating Corruption, Overoptimization, and Verbosity Simultaneously in Offline and Online RLHF/DPO Alignment
Provably Mitigating Corruption, Overoptimization, and Verbosity Simultaneously in Offline and Online RLHF/DPO Alignment
Ziyi Chen
Junyi Li
Qi He
Heng-Chiao Huang
136
0
0
07 Oct 2025
Primal-Dual Direct Preference Optimization for Constrained LLM Alignment
Primal-Dual Direct Preference Optimization for Constrained LLM Alignment
Yihan Du
Seo Taek Kong
R. Srikant
76
0
0
07 Oct 2025
CAM: A Constructivist View of Agentic Memory for LLM-Based Reading Comprehension
CAM: A Constructivist View of Agentic Memory for LLM-Based Reading Comprehension
Rui Li
Zeyu Zhang
Xiaohe Bo
Zihang Tian
Xu Chen
Quanyu Dai
Zhenhua Dong
Ruiming Tang
RALM
132
0
0
07 Oct 2025
Margin Adaptive DPO: Leveraging Reward Model for Granular Control in Preference Optimization
Margin Adaptive DPO: Leveraging Reward Model for Granular Control in Preference Optimization
Hyung Gyu Rho
72
0
0
06 Oct 2025
Aligning Language Models with Clinical Expertise: DPO for Heart Failure Nursing Documentation in Critical Care
Aligning Language Models with Clinical Expertise: DPO for Heart Failure Nursing Documentation in Critical Care
Junyi Fan
Li Sun
Negin Ashrafi
Kamiar Alaei
Maryam Pishgar
76
1
0
06 Oct 2025
Distribution Preference Optimization: A Fine-grained Perspective for LLM Unlearning
Distribution Preference Optimization: A Fine-grained Perspective for LLM Unlearning
Kai Qin
Jiaqi Wu
Jianxiang He
Haoyuan Sun
Yifei Zhao
Bin Liang
Yongzhe Chang
Tiantian Zhang
Houde Liu
MU
170
1
0
06 Oct 2025
Reward Model Routing in Alignment
Reward Model Routing in Alignment
Xinle Wu
Yao Lu
116
0
0
03 Oct 2025
Predictive Preference Learning from Human Interventions
Predictive Preference Learning from Human Interventions
Haoyuan Cai
Zhenghao Peng
Bolei Zhou
131
0
0
02 Oct 2025
How Well Can Preference Optimization Generalize Under Noisy Feedback?
How Well Can Preference Optimization Generalize Under Noisy Feedback?
Shawn Im
Yixuan Li
198
0
0
01 Oct 2025
Safety Instincts: LLMs Learn to Trust Their Internal Compass for Self-Defense
Safety Instincts: LLMs Learn to Trust Their Internal Compass for Self-Defense
Guobin Shen
Dongcheng Zhao
Haibo Tong
Jindong Li
Feifei Zhao
Yi Zeng
104
0
0
01 Oct 2025
MCM-DPO: Multifaceted Cross-Modal Direct Preference Optimization for Alt-text Generation
MCM-DPO: Multifaceted Cross-Modal Direct Preference Optimization for Alt-text Generation
Jinlan Fu
Shenzhen Huangfu
Hao Fei
Yichong Huang
Xiaoyu Shen
Xipeng Qiu
See-Kiong Ng
61
0
0
01 Oct 2025
Alignment-Aware Decoding
Alignment-Aware Decoding
Frédéric Berdoz
Luca A. Lanzendörfer
René Caky
Roger Wattenhofer
108
0
0
30 Sep 2025
1234...101112
Next