Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2310.12036
Cited By
v1
v2 (latest)
A General Theoretical Paradigm to Understand Learning from Human Preferences
International Conference on Artificial Intelligence and Statistics (AISTATS), 2023
18 October 2023
M. G. Azar
Mark Rowland
Bilal Piot
Daniel Guo
Daniele Calandriello
Michal Valko
Rémi Munos
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (16 upvotes)
Papers citing
"A General Theoretical Paradigm to Understand Learning from Human Preferences"
50 / 578 papers shown
Title
Improving Sampling Efficiency in RLVR through Adaptive Rollout and Response Reuse
Yuheng Zhang
Wenlin Yao
Changlong Yu
Yao Liu
Qingyu Yin
Bing Yin
Hyokun Yun
Lihong Li
125
1
0
30 Sep 2025
The Era of Real-World Human Interaction: RL from User Conversations
Chuanyang Jin
Jing Xu
Bo Liu
Leitian Tao
O. Yu. Golovneva
Tianmin Shu
Wenting Zhao
Xian Li
Jason Weston
OffRL
100
1
0
29 Sep 2025
Humanline: Online Alignment as Perceptual Loss
Sijia Liu
Niklas Muennighoff
Kawin Ethayarajh
72
0
0
29 Sep 2025
Clean First, Align Later: Benchmarking Preference Data Cleaning for Reliable LLM Alignment
Min-Hsuan Yeh
Yixuan Li
191
1
0
28 Sep 2025
Alignment through Meta-Weighted Online Sampling: Bridging the Gap between Data Generation and Preference Optimization
Junming Yang
Ning Xu
Biao Liu
Shiqi Qiao
Xin Geng
92
0
0
27 Sep 2025
Multiplayer Nash Preference Optimization
Fang Wu
X. Y. Huang
Weihao Xuan
Zhiwei Zhang
Yijia Xiao
...
Xiaomin Li
Bing Hu
Peng Xia
Jure Leskovec
Yejin Choi
112
1
0
27 Sep 2025
General Exploratory Bonus for Optimistic Exploration in RLHF
W. Li
Changdae Oh
Yixuan Li
AI4CE
261
0
0
27 Sep 2025
Adaptive Margin RLHF via Preference over Preferences
Yaswanth Chittepu
Prasann Singhal
Greg Durrett
S. Niekum
152
1
0
26 Sep 2025
Towards Efficient Online Exploration for Reinforcement Learning with Human Feedback
Gen Li
Yuling Yan
OffRL
96
0
0
26 Sep 2025
Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-Training
Junkai Zhang
Zihao Wang
Lin Gui
Swarnashree Mysore Sathyendra
Jaehwan Jeong
Victor Veitch
Wei Wang
Yunzhong He
Bing Liu
Lifeng Jin
ALM
LRM
162
2
0
25 Sep 2025
Future Policy Aware Preference Learning for Mathematical Reasoning
Minjae Oh
Yunho Choi
Dongmin Choi
Yohan Jo
97
0
0
24 Sep 2025
Failure Modes of Maximum Entropy RLHF
Ömer Veysel Çağatan
Barış Akgün
85
0
0
24 Sep 2025
DriveDPO: Policy Learning via Safety DPO For End-to-End Autonomous Driving
Shuyao Shang
Yuntao Chen
Yuqi Wang
Yingyan Li
Zhaoxiang Zhang
145
5
0
22 Sep 2025
TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs
Yunheng Li
Jing Cheng
Shaoyong Jia
Hangyi Kuang
Shaohui Jiao
Qibin Hou
Ming-Ming Cheng
AI4TS
VLM
204
6
0
22 Sep 2025
OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models
Jinshu Chen
Xinghui Li
Xu Bai
Tianxiang Ma
Pengze Zhang
...
Gen Li
Lijie Liu
Songtao Zhao
Bingchuan Li
Qian He
DiffM
VGen
152
1
0
22 Sep 2025
Reinforcement Learning Meets Large Language Models: A Survey of Advancements and Applications Across the LLM Lifecycle
Keliang Liu
Dingkang Yang
Ziyun Qian
Weijie Yin
Y. Wang
Hongsheng Li
Jun Liu
Peng Zhai
Y. Liu
Lihua Zhang
OffRL
LRM
198
6
0
20 Sep 2025
Towards Universal Debiasing for Language Models-based Tabular Data Generation
Tianchun Li
Tianci Liu
Xingchen Wang
Rongzhe Wei
P. Li
Lu Su
Jing Gao
116
0
0
20 Sep 2025
FlowRL: Matching Reward Distributions for LLM Reasoning
Xuekai Zhu
Daixuan Cheng
D. Zhang
Hengli Li
Kaiyan Zhang
...
J. Gao
Xiaodong Liu
Bowen Zhou
Hongyuan Mei
Zhouhan Lin
LRM
197
5
0
18 Sep 2025
CM-Align: Consistency-based Multilingual Alignment for Large Language Models
Xue Zhang
Yunlong Liang
Fandong Meng
Songming Zhang
Yufeng Chen
Jinan Xu
Jie Zhou
129
0
0
10 Sep 2025
Outcome-based Exploration for LLM Reasoning
Yuda Song
Julia Kempe
Remi Munos
OffRL
LRM
253
30
0
08 Sep 2025
Let's Roleplay: Examining LLM Alignment in Collaborative Dialogues
Abhijnan Nath
Carine Graff
Nikhil Krishnaswamy
LLMAG
124
2
0
07 Sep 2025
COMMET: A System for Human-Induced Conflicts in Mobile Manipulation of Everyday Tasks
Dongping Li
Shaoting Peng
John Pohovey
Katherine Rose Driggs-Campbell
88
0
0
05 Sep 2025
Adaptive Preference Optimization with Uncertainty-aware Utility Anchor
Xiaobo Wang
Zixia Jia
Jiaqi Li
Qi Liu
Zilong Zheng
88
0
0
03 Sep 2025
Weights-Rotated Preference Optimization for Large Language Models
Chenxu Yang
Ruipeng Jia
Mingyu Zheng
Naibin Gu
Zheng Lin
Siyuan Chen
Weichong Yin
Hua Wu
Weiping Wang
105
0
0
25 Aug 2025
What Matters in Data for DPO?
Yu Pan
Zhongze Cai
Guanting Chen
Huaiyang Zhong
Chonghuan Wang
236
3
0
23 Aug 2025
Political Ideology Shifts in Large Language Models
Pietro Bernardelle
Stefano Civelli
Leon Fröhling
Riccardo Lunardi
Kevin Roitero
Gianluca Demartini
92
1
0
22 Aug 2025
CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated Chain-of-Thought-based Reinforced Fine-Tuning
Wenqiao Zhu
Ji Liu
Rongjuncheng Zhang
Haipang Wu
Yulun Zhang
OffRL
LRM
179
0
0
21 Aug 2025
Linear Preference Optimization: Decoupled Gradient Control via Absolute Regularization
Rui Wang
Qianguo Sun
Chao Song
Junlong Wu
Tianrong Chen
Zhiyun Zeng
Yu Li
191
1
0
20 Aug 2025
DegDiT: Controllable Audio Generation with Dynamic Event Graph Guided Diffusion Transformer
Yisu Liu
Chenxing Li
Wanqian Zhang
Wenfu Wang
Meng Yu
Ruibo Fu
Zheng Lin
Weiping Wang
Dong Yu
DiffM
92
0
0
19 Aug 2025
FantasyTalking2: Timestep-Layer Adaptive Preference Optimization for Audio-Driven Portrait Animation
Mengchao Wang
Qiang Wang
Fan Jiang
Mu Xu
EGVM
VGen
105
1
0
15 Aug 2025
On Negative-aware Preference Optimization for Recommendation
Chenlu Ding
Daoxuan Liu
Jiancan Wu
Xingyu Hu
Junkang Wu
Haitao Wang
Yongkang Wang
Xingxing Wang
Xiang Wang
84
0
0
13 Aug 2025
Fine-grained Video Dubbing Duration Alignment with Segment Supervised Preference Optimization
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Chaoqun Cui
Liangbin Huang
Shijing Wang
Zhe Tong
Zhaolong Huang
Xiao Zeng
Xiaofeng Liu
108
4
0
12 Aug 2025
Beyond Ordinal Preferences: Why Alignment Needs Cardinal Human Feedback
Parker Whitfill
Stewy Slocum
ALM
62
0
0
11 Aug 2025
Enhancing Small LLM Alignment through Margin-Based Objective Modifications under Resource Constraints
Daren Yao
Jinsong Yuan
Ruike Chen
76
0
0
11 Aug 2025
Sample-efficient LLM Optimization with Reset Replay
Zichuan Liu
Jinyu Wang
Lei Song
Jiang Bian
OffRL
171
0
0
08 Aug 2025
Forgetting: A New Mechanism Towards Better Large Language Model Fine-tuning
Ali Taheri Ghahrizjani
Alireza Taban
Qizhou Wang
Shanshan Ye
Tongliang Liu
Tongliang Liu
CLL
MU
224
1
0
06 Aug 2025
Generative Bid Shading in Real-Time Bidding Advertising
Yinqiu Huang
Hao Ma
Wenshuai Chen
Shuli Wang
Yongqiang Zhang
Xue Wei
Yinhua Zhu
Haitao Wang
Xingxing Wang
117
0
0
06 Aug 2025
Difficulty-Based Preference Data Selection by DPO Implicit Reward Gap
Xuan Qi
Rongwu Xu
Zhijing Jin
84
1
0
06 Aug 2025
V.I.P. : Iterative Online Preference Distillation for Efficient Video Diffusion Models
Jisoo Kim
Wooseok Seo
Junwan Kim
Seungho Park
Sooyeon Park
Youngjae Yu
VGen
159
1
0
05 Aug 2025
Enhancing Math Reasoning in Small-sized LLMs via Preview Difficulty-Aware Intervention
Xinhan Di
JoyJiaoW
LRM
107
2
0
03 Aug 2025
Phi-Ground Tech Report: Advancing Perception in GUI Grounding
Miaosen Zhang
Ziqiang Xu
Jialiang Zhu
Qi Dai
Kai Qiu
...
Chong Luo
Tianyi Chen
Justin Wagle
Tim Franklin
Baining Guo
LRM
200
8
0
31 Jul 2025
Unlearning of Knowledge Graph Embedding via Preference Optimization
Jiajun Liu
Wenjun Ke
Peng Wang
Yao He
Ziyu Shang
Guozheng Li
Zijie Xu
Ke Ji
MU
136
0
0
28 Jul 2025
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge
Guangchen Lan
Sipeng Zhang
Tianle Wang
Yuwei Zhang
Daoan Zhang
Xinpeng Wei
Xiaoman Pan
Hongming Zhang
Dong-Jun Han
Christopher G. Brinton
266
2
0
27 Jul 2025
PITA: Preference-Guided Inference-Time Alignment for LLM Post-Training
Sarat Chandra Bobbili
Ujwal Dinesha
Dheeraj Narasimha
S. Shakkottai
119
2
0
26 Jul 2025
Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions
Simon Matrenok
Skander Moalla
Çağlar Gülçehre
46
0
0
10 Jul 2025
Principled Foundations for Preference Optimization
Wenxuan Zhou
Shujian Zhang
Brice Magdalou
John Lambert
Ehsan Amid
Richard Nock
Andrew Straiton Hard
258
0
0
10 Jul 2025
The Hidden Link Between RLHF and Contrastive Learning
Xufei Lv
Kehai Chen
Haoyuan Sun
X. Bai
Min Zhang
Houde Liu
Kehai Chen
190
2
0
27 Jun 2025
Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs
Yifan Shen
Yuanzhe Liu
Jingyuan Zhu
Xu Cao
Xiaofeng Zhang
Yixiao He
Wenming Ye
James M. Rehg
Ismini Lourentzou
LRM
134
3
0
26 Jun 2025
Rethinking DPO: The Role of Rejected Responses in Preference Misalignment
Jay Hyeon Cho
JunHyeok Oh
Myunsoo Kim
Byung-Jun Lee
202
3
0
15 Jun 2025
Theoretical Tensions in RLHF: Reconciling Empirical Success with Inconsistencies in Social Choice Theory
Jiancong Xiao
Zhekun Shi
Kaizhao Liu
Q. Long
Weijie J. Su
211
3
0
14 Jun 2025
Previous
1
2
3
4
5
...
10
11
12
Next