Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2310.12036
Cited By
v1
v2 (latest)
A General Theoretical Paradigm to Understand Learning from Human Preferences
International Conference on Artificial Intelligence and Statistics (AISTATS), 2023
18 October 2023
M. G. Azar
Mark Rowland
Bilal Piot
Daniel Guo
Daniele Calandriello
Michal Valko
Rémi Munos
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (16 upvotes)
Papers citing
"A General Theoretical Paradigm to Understand Learning from Human Preferences"
50 / 574 papers shown
Title
STORE: Semantic Tokenization, Orthogonal Rotation and Efficient Attention for Scaling Up Ranking Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Y. Xu
Chaofan Fan
J. Hu
Yu Zhang
Zeng Xiaoyi
J. Zhang
144
1
0
24 Nov 2025
FanarGuard: A Culturally-Aware Moderation Filter for Arabic Language Models
M. Fatehkia
Enes Altinisik
Husrev Taha Sencar
54
0
0
24 Nov 2025
Beyond Reward Margin: Rethinking and Resolving Likelihood Displacement in Diffusion Models via Video Generation
Ruojun Xu
Yu Kai
Xuhua Ren
Jiaxiang Cheng
Bing Ma
Tianxiang Zheng
Qinhlin Lu
EGVM
100
0
0
24 Nov 2025
TASO: Jailbreak LLMs via Alternative Template and Suffix Optimization
Yanting Wang
Runpeng Geng
Jinghui Chen
Minhao Cheng
Jinyuan Jia
158
0
0
23 Nov 2025
Bootstrapping LLMs via Preference-Based Policy Optimization
Chen Jia
OffRL
226
0
0
17 Nov 2025
Rethinking Deep Alignment Through The Lens Of Incomplete Learning
Thong Bach
D. Nguyen
T. Le
T. Tran
72
0
0
15 Nov 2025
Feedback Descent: Open-Ended Text Optimization via Pairwise Comparison
Yoonho Lee
Joseph Boen
Chelsea Finn
119
1
0
11 Nov 2025
HLPD: Aligning LLMs to Human Language Preference for Machine-Revised Text Detection
Fangqi Dai
Xingjian Jiang
Zizhuang Deng
DeLMO
473
0
0
10 Nov 2025
SPA: Achieving Consensus in LLM Alignment via Self-Priority Optimization
Yue Huang
Xiangqi Wang
Xiangliang Zhang
100
0
0
09 Nov 2025
The Realignment Problem: When Right becomes Wrong in LLMs
Aakash Sen Sharma
Debdeep Sanyal
Vivek Srivastava
Shirish Karande
Murari Mandal
MU
173
0
0
04 Nov 2025
VCORE: Variance-Controlled Optimization-based Reweighting for Chain-of-Thought Supervision
Xuan Gong
Senmiao Wang
Hanbo Huang
Ruoyu Sun
Shiyu Liang
OffRL
LRM
94
0
0
31 Oct 2025
Greedy Sampling Is Provably Efficient for RLHF
Di Wu
Chengshuai Shi
Jing Yang
Cong Shen
82
0
0
28 Oct 2025
Success and Cost Elicit Convention Formation for Efficient Communication
Saujas Vaduguru
Yilun Hua
Yoav Artzi
Daniel Fried
OffRL
80
0
0
28 Oct 2025
The Best of N Worlds: Aligning Reinforcement Learning with Best-of-N Sampling via max@k Optimisation
Farid Bagirov
Mikhail Arkhipov
Ksenia Sycheva
Evgeniy Glukhov
Egor Bogomolov
71
0
0
27 Oct 2025
Lightweight Robust Direct Preference Optimization
Cheol Woo Kim
Shresth Verma
Mauricio Tec
Milind Tambe
80
0
0
27 Oct 2025
Aligning Diffusion Language Models via Unpaired Preference Optimization
Vaibhav Jindal
Hejian Sang
Chun-Mao Lai
Yanning Chen
Zhipeng Wang
128
0
0
26 Oct 2025
Learning "Partner-Aware" Collaborators in Multi-Party Collaboration
Abhijnan Nath
Nikhil Krishnaswamy
102
0
0
26 Oct 2025
Beyond Pairwise: Empowering LLM Alignment With Ranked Choice Modeling
Yuxuan Tang
Yifan Feng
84
0
0
24 Oct 2025
Reducing the Probability of Undesirable Outputs in Language Models Using Probabilistic Inference
S. Zhao
Aidan Li
Rob Brekelmans
Roger C. Grosse
56
0
0
24 Oct 2025
Why DPO is a Misspecified Estimator and How to Fix It
Aditya Gopalan
Sayak Ray Chowdhury
Debangshu Banerjee
108
0
0
23 Oct 2025
KL-Regularized Reinforcement Learning is Designed to Mode Collapse
Anthony GX-Chen
Jatin Prakash
Jeff Guo
Rob Fergus
Rajesh Ranganath
101
1
0
23 Oct 2025
Robust Preference Alignment via Directional Neighborhood Consensus
Ruochen Mao
Yuling Shi
Xiaodong Gu
Jiaheng Wei
135
0
0
23 Oct 2025
g-DPO: Scalable Preference Optimization for Protein Language Models
Constance Ferragu
Jonathan D. Ziegler
Nicolas Deutschmann
Arthur Lindoulsi
Eli Bixby
Cradle ML Team
137
0
0
22 Oct 2025
ADPO: Anchored Direct Preference Optimization
Wang Zixian
139
0
0
21 Oct 2025
Eliciting Truthful Feedback for Preference-Based Learning via the VCG Mechanism
Leo Landolt
Anna M. Maddux
Andreas Schlaginhaufen
Saurabh Vaishampayan
Maryam Kamgarpour
113
0
0
20 Oct 2025
RL makes MLLMs see better than SFT
Junha Song
Sangdoo Yun
Dongyoon Han
Jaegul Choo
Byeongho Heo
OffRL
151
0
0
18 Oct 2025
Putting on the Thinking Hats: A Survey on Chain of Thought Fine-tuning from the Perspective of Human Reasoning Mechanism
Xiaoshu Chen
Sihang Zhou
Ke Liang
Duanyang Yuan
Haoyuan Chen
Xiaoyu Sun
Linyuan Meng
Xinwang Liu
ReLM
LRM
209
0
0
15 Oct 2025
Beyond Correctness: Rewarding Faithful Reasoning in Retrieval-Augmented Generation
Zhichao Xu
Zongyu Wu
Yun Zhou
Aosong Feng
Kang Zhou
...
Yijun Tian
Xuan Qi
Weikang Qiu
Lin Lee Cheong
Haibo Ding
OffRL
RALM
LRM
96
0
0
15 Oct 2025
Information-Theoretic Reward Modeling for Stable RLHF: Detecting and Mitigating Reward Hacking
Yuchun Miao
Liang Ding
Sen Zhang
Rong Bao
L. Zhang
Dacheng Tao
164
0
0
15 Oct 2025
Towards Understanding Valuable Preference Data for Large Language Model Alignment
Zizhuo Zhang
Qizhou Wang
Shanshan Ye
Jianing Zhu
Jiangchao Yao
Bo Han
Masashi Sugiyama
72
0
0
15 Oct 2025
Reinforced Preference Optimization for Recommendation
Junfei Tan
Yuxin Chen
An Zhang
Junguang Jiang
Yinan Han
Ziru Xu
Han Zhu
Jian Xu
Bo Zheng
Xiang-Bin Wang
OffRL
166
1
0
14 Oct 2025
On the Role of Preference Variance in Preference Optimization
Jiacheng Guo
Zihao Li
Jiahao Qiu
Yue Wu
Mengdi Wang
124
1
0
14 Oct 2025
DICE: Structured Reasoning in LLMs through SLM-Guided Chain-of-Thought Correction
Y. Li
Yusheng Liao
Zhe Chen
Yanfeng Wang
Yu Wang
LRM
152
0
0
10 Oct 2025
Mix- and MoE-DPO: A Variational Inference Approach to Direct Preference Optimization
Jason Bohne
Pawel Polak
David S. Rosenberg
Brian Bloniarz
Gary Kazantsev
114
0
0
09 Oct 2025
PIKA: Expert-Level Synthetic Datasets for Post-Training Alignment from Scratch
Shangjian Yin
Shining Liang
Wenbiao Ding
Yuli Qian
Zhouxing Shi
Hongzhi Li
Yutao Xie
ALM
106
0
0
08 Oct 2025
Primal-Dual Direct Preference Optimization for Constrained LLM Alignment
Yihan Du
Seo Taek Kong
R. Srikant
76
0
0
07 Oct 2025
CAM: A Constructivist View of Agentic Memory for LLM-Based Reading Comprehension
Rui Li
Zeyu Zhang
Xiaohe Bo
Zihang Tian
Xu Chen
Quanyu Dai
Zhenhua Dong
Ruiming Tang
RALM
124
1
0
07 Oct 2025
Provably Mitigating Corruption, Overoptimization, and Verbosity Simultaneously in Offline and Online RLHF/DPO Alignment
Ziyi Chen
Junyi Li
Qi He
Heng-Chiao Huang
120
0
0
07 Oct 2025
Aligning Language Models with Clinical Expertise: DPO for Heart Failure Nursing Documentation in Critical Care
Junyi Fan
Li Sun
Negin Ashrafi
Kamiar Alaei
Maryam Pishgar
68
1
0
06 Oct 2025
Margin Adaptive DPO: Leveraging Reward Model for Granular Control in Preference Optimization
Hyung Gyu Rho
72
0
0
06 Oct 2025
Distribution Preference Optimization: A Fine-grained Perspective for LLM Unlearning
Kai Qin
Jiaqi Wu
Jianxiang He
Haoyuan Sun
Yifei Zhao
Bin Liang
Yongzhe Chang
Tiantian Zhang
Houde Liu
MU
166
1
0
06 Oct 2025
Reward Model Routing in Alignment
Xinle Wu
Yao Lu
104
0
0
03 Oct 2025
Predictive Preference Learning from Human Interventions
Haoyuan Cai
Zhenghao Peng
Bolei Zhou
127
0
0
02 Oct 2025
Safety Instincts: LLMs Learn to Trust Their Internal Compass for Self-Defense
Guobin Shen
Dongcheng Zhao
Haibo Tong
Jindong Li
Feifei Zhao
Yi Zeng
100
0
0
01 Oct 2025
How Well Can Preference Optimization Generalize Under Noisy Feedback?
Shawn Im
Yixuan Li
178
0
0
01 Oct 2025
MCM-DPO: Multifaceted Cross-Modal Direct Preference Optimization for Alt-text Generation
Jinlan Fu
Shenzhen Huangfu
Hao Fei
Yichong Huang
Xiaoyu Shen
Xipeng Qiu
See-Kiong Ng
61
0
0
01 Oct 2025
Alignment-Aware Decoding
Frédéric Berdoz
Luca A. Lanzendörfer
René Caky
Roger Wattenhofer
96
0
0
30 Sep 2025
Improving Sampling Efficiency in RLVR through Adaptive Rollout and Response Reuse
Yuheng Zhang
Wenlin Yao
Changlong Yu
Yao Liu
Qingyu Yin
Bing Yin
Hyokun Yun
Lihong Li
105
0
0
30 Sep 2025
Humanline: Online Alignment as Perceptual Loss
Sijia Liu
Niklas Muennighoff
Kawin Ethayarajh
60
0
0
29 Sep 2025
The Era of Real-World Human Interaction: RL from User Conversations
Chuanyang Jin
Jing Xu
Bo Liu
Leitian Tao
O. Yu. Golovneva
Tianmin Shu
Wenting Zhao
Xian Li
Jason Weston
OffRL
76
1
0
29 Sep 2025
1
2
3
4
...
10
11
12
Next