Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2310.12036
Cited By
v1
v2 (latest)
A General Theoretical Paradigm to Understand Learning from Human Preferences
International Conference on Artificial Intelligence and Statistics (AISTATS), 2023
18 October 2023
M. G. Azar
Mark Rowland
Bilal Piot
Daniel Guo
Daniele Calandriello
Michal Valko
Rémi Munos
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (16 upvotes)
Papers citing
"A General Theoretical Paradigm to Understand Learning from Human Preferences"
50 / 578 papers shown
Title
Process Reinforcement through Implicit Rewards
Ganqu Cui
Lifan Yuan
Liang Luo
Hanbin Wang
Wendi Li
...
Maosong Sun
Zhiyuan Liu
Ning Ding
Bowen Zhou
Ning Ding
OffRL
LRM
430
220
0
03 Feb 2025
The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking
Yuchun Miao
Sen Zhang
Liang Ding
Yuqi Zhang
Guang Dai
Dacheng Tao
631
9
0
31 Jan 2025
LiPO: Listwise Preference Optimization through Learning-to-Rank
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Tianqi Liu
Zhen Qin
Junru Wu
Jiaming Shen
Misha Khalman
...
Mohammad Saleh
Simon Baumgartner
Jialu Liu
Peter J. Liu
Xuanhui Wang
578
81
0
28 Jan 2025
Style Outweighs Substance: Failure Modes of LLM Judges in Alignment Benchmarking
International Conference on Learning Representations (ICLR), 2024
Benjamin Feuer
Micah Goldblum
Teresa Datta
Sanjana Nambiar
Raz Besaleli
Samuel Dooley
Max Cembalest
John P. Dickerson
ALM
322
0
0
28 Jan 2025
Inverse-RLignment: Large Language Model Alignment from Demonstrations through Inverse Reinforcement Learning
Hao Sun
M. Schaar
375
23
0
28 Jan 2025
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback
Yafu Li
Xuyang Hu
Xiaoye Qu
Linjie Li
Yu Cheng
283
31
0
22 Jan 2025
Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Yannis Flet-Berliac
Nathan Grinsztajn
Florian Strub
Bill Wu
Eugene Choi
...
Arash Ahmadian
Yash Chandak
M. G. Azar
Olivier Pietquin
Matthieu Geist
OffRL
364
13
0
17 Jan 2025
Iterative Label Refinement Matters More than Preference Optimization under Weak Supervision
International Conference on Learning Representations (ICLR), 2025
Yaowen Ye
Cassidy Laidlaw
Jacob Steinhardt
ALM
198
2
0
14 Jan 2025
Preference-Based Multi-Agent Reinforcement Learning: Data Coverage and Algorithmic Techniques
Natalia Zhang
X. Wang
Qiwen Cui
Runlong Zhou
Sham Kakade
Simon S. Du
OffRL
401
1
0
10 Jan 2025
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model
Yueqin Yin
Shentao Yang
Yujia Xie
Ziyi Yang
Yuting Sun
Hany Awadalla
Weizhu Chen
Mingyuan Zhou
304
5
0
07 Jan 2025
An Overview and Discussion on Using Large Language Models for Implementation Generation of Solutions to Open-Ended Problems
Hashmath Shaik
Alex Doboli
OffRL
ELM
1.0K
2
0
31 Dec 2024
Geometric-Averaged Preference Optimization for Soft Preference Labels
Neural Information Processing Systems (NeurIPS), 2024
Hiroki Furuta
Kuang-Huei Lee
Shixiang Shane Gu
Y. Matsuo
Aleksandra Faust
Heiga Zen
Izzeddin Gur
404
15
0
31 Dec 2024
Verbosity-Aware Rationale Reduction: Effective Reduction of Redundant Rationale via Principled Criteria
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Joonwon Jang
Jaehee Kim
Wonbin Kweon
Seonghyeon Lee
Hwanjo Yu
LRM
493
2
0
30 Dec 2024
Understanding the Logic of Direct Preference Alignment through Logic
Kyle Richardson
Vivek Srikumar
Ashish Sabharwal
485
4
0
23 Dec 2024
JailPO: A Novel Black-box Jailbreak Framework via Preference Optimization against Aligned LLMs
AAAI Conference on Artificial Intelligence (AAAI), 2024
Haoyang Li
Jiawei Ye
Jie Wu
Tianjie Yan
Chu Wang
Zhixin Li
AAML
180
5
0
20 Dec 2024
REFA: Reference Free Alignment for multi-preference optimization
Taneesh Gupta
Rahul Madhavan
Xuchao Zhang
Chetan Bansal
Saravan Rajmohan
435
1
0
20 Dec 2024
Energy-Based Preference Model Offers Better Offline Alignment than the Bradley-Terry Preference Model
Yuzhong Hong
Hanshan Zhang
Junwei Bao
Hongfei Jiang
Yang Song
OffRL
225
6
0
18 Dec 2024
The Superalignment of Superhuman Intelligence with Large Language Models
Science China Information Sciences (Sci. China Inf. Sci.), 2024
Shiyu Huang
Yingkang Wang
Shiyao Cui
Pei Ke
J. Tang
426
1
0
15 Dec 2024
Hybrid Preference Optimization for Alignment: Provably Faster Convergence Rates by Combining Offline Preferences with Online Exploration
Avinandan Bose
Zhihan Xiong
Aadirupa Saha
S. Du
Maryam Fazel
303
2
0
13 Dec 2024
Reinforcement Learning Enhanced LLMs: A Survey
Shuhe Wang
Shengyu Zhang
Jing Zhang
Runyi Hu
Xiaoya Li
Minlie Huang
Jiwei Li
Leilei Gan
G. Wang
Eduard H. Hovy
OffRL
662
48
0
05 Dec 2024
Time-Reversal Provides Unsupervised Feedback to LLMs
Neural Information Processing Systems (NeurIPS), 2024
Yerram Varun
Rahul Madhavan
Sravanti Addepalli
A. Suggala
Karthikeyan Shanmugam
Prateek Jain
LRM
SyDa
345
1
0
03 Dec 2024
VideoSAVi: Self-Aligned Video Language Models without Human Supervision
Yogesh Kulkarni
Pooyan Fazli
VLM
582
5
0
01 Dec 2024
AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward
Computer Vision and Pattern Recognition (CVPR), 2024
Haonan Han
Xiangzuo Wu
Huan Liao
Zunnan Xu
Zhongyuan Hu
Ronghui Li
Yachao Zhang
Xiu Li
VGen
179
5
0
27 Nov 2024
Learning from Relevant Subgoals in Successful Dialogs using Iterative Training for Task-oriented Dialog Systems
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Magdalena Kaiser
P. Ernst
György Szarvas
175
1
0
25 Nov 2024
DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs
Zhihan Liu
Shenao Zhang
Yongfei Liu
Boyi Liu
Yingxiang Yang
Zhaoran Wang
377
6
0
20 Nov 2024
Reward Modeling with Ordinal Feedback: Wisdom of the Crowd
Shang Liu
Yu Pan
Guanting Chen
Xiaocheng Li
322
3
0
19 Nov 2024
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Xinyan Guan
Yanjiang Liu
Xinyu Lu
Boxi Cao
Xianpei Han
...
Le Sun
Jie Lou
Bowen Yu
Yaojie Lu
Hongyu Lin
ALM
534
8
0
18 Nov 2024
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Weiyun Wang
Zhe Chen
Wenhai Wang
Yue Cao
Yangzhou Liu
...
Jinguo Zhu
X. Zhu
Lewei Lu
Yu Qiao
Jifeng Dai
LRM
478
176
1
15 Nov 2024
Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching
International Conference on Learning Representations (ICLR), 2024
A. Jain
Harley Wiltzer
Jesse Farebrother
Irina Rish
Glen Berseth
Sanjiban Choudhury
322
6
0
11 Nov 2024
Towards Improved Preference Optimization Pipeline: from Data Generation to Budget-Controlled Regularization
Zhuotong Chen
Fang Liu
Jennifer Zhu
Wanyu Du
Yanjun Qi
261
2
0
07 Nov 2024
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao
Chenlu Ye
Quanquan Gu
Tong Zhang
OffRL
479
10
0
07 Nov 2024
Sample-Efficient Alignment for LLMs
Zichen Liu
Changyu Chen
Chao Du
Wee Sun Lee
Min Lin
245
9
0
03 Nov 2024
TODO: Enhancing LLM Alignment with Ternary Preferences
International Conference on Learning Representations (ICLR), 2024
Yuxiang Guo
Lu Yin
Bo Jiang
Jiaqi Zhang
327
5
0
02 Nov 2024
Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval
International Conference on Learning Representations (ICLR), 2024
Sheryl Hsu
Omar Khattab
Chelsea Finn
Archit Sharma
KELM
RALM
277
15
0
30 Oct 2024
VPO: Leveraging the Number of Votes in Preference Optimization
Jae Hyeon Cho
Minkyung Park
Byung-Jun Lee
78
2
0
30 Oct 2024
COMAL: A Convergent Meta-Algorithm for Aligning LLMs with General Preferences
Yongxu Liu
Argyris Oikonomou
Weiqiang Zheng
Yang Cai
Arman Cohan
291
3
0
30 Oct 2024
f
f
f
-PO: Generalizing Preference Optimization with
f
f
f
-divergence Minimization
International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Jiaqi Han
Mingjian Jiang
Yuxuan Song
J. Leskovec
Stefano Ermon
326
9
0
29 Oct 2024
Matryoshka Pilot: Learning to Drive Black-Box LLMs with LLMs
Changhao Li
Yuchen Zhuang
Rushi Qiang
Haotian Sun
H. Dai
Chao Zhang
Bo Dai
LRM
252
6
0
28 Oct 2024
UFT: Unifying Fine-Tuning of SFT and RLHF/DPO/UNA through a Generalized Implicit Reward Function
Zhichao Wang
Bin Bi
Z. Zhu
Xiangbo Mao
Jun Wang
Shiyu Wang
CLL
231
5
0
28 Oct 2024
Learning from Response not Preference: A Stackelberg Approach for LLM Detoxification using Non-parallel Data
Xinhong Xie
Tao Li
Quanyan Zhu
166
4
0
27 Oct 2024
Fast Best-of-N Decoding via Speculative Rejection
Neural Information Processing Systems (NeurIPS), 2024
Hanshi Sun
Momin Haider
Ruiqi Zhang
Huitao Yang
Jiahao Qiu
Ming Yin
Mengdi Wang
Peter L. Bartlett
Andrea Zanette
BDL
335
92
0
26 Oct 2024
Uncertainty-Penalized Direct Preference Optimization
Sam Houliston
Alizée Pace
Alexander Immer
Gunnar Rätsch
128
0
0
26 Oct 2024
2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision
Shilong Li
Yancheng He
Hui Huang
Xingyuan Bu
Qingbin Liu
Hangyu Guo
Weixun Wang
Jihao Gu
Yuchi Xu
Bo Zheng
196
8
0
25 Oct 2024
Inference time LLM alignment in single and multidomain preference spectrum
Siyang Song
Zheng Qi
Nikolaos Pappas
Srikanth Doss Kadarundalagi Raghuram Doss
Monica Sunkara
Kishaloy Halder
Manuel Mager
Yassine Benajiba
133
3
0
24 Oct 2024
Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model
International Conference on Learning Representations (ICLR), 2024
Wenhong Zhu
Zhiwei He
Xiaofeng Wang
Pengfei Liu
Rui Wang
OSLM
326
12
0
24 Oct 2024
Scalable Ranked Preference Optimization for Text-to-Image Generation
Shyamgopal Karthik
Huseyin Coskun
Zeynep Akata
Sergey Tulyakov
J. Ren
Vidit Goel
EGVM
287
19
0
23 Oct 2024
Optimal Design for Reward Modeling in RLHF
Antoine Scheid
Etienne Boursier
Alain Durmus
Michael I. Jordan
Pierre Ménard
Eric Moulines
Michal Valko
OffRL
401
16
0
22 Oct 2024
Understanding Forgetting in LLM Supervised Fine-Tuning and Preference Learning - A Convex Optimization Perspective
H. Fernando
Han Shen
Parikshit Ram
Yi Zhou
Horst Samulowitz
Nathalie Baracaldo
Tianyi Chen
CLL
410
10
0
20 Oct 2024
GDPO: Learning to Directly Align Language Models with Diversity Using GFlowNets
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Oh Joon Kwon
Daiki E. Matsunaga
Kee-Eung Kim
AI4CE
213
4
0
19 Oct 2024
TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling
Jiahao Qiu
Yifu Lu
Yifan Zeng
Jiacheng Guo
Jiayi Geng
...
Ling Yang
Mengdi Wang
Kaixuan Huang
Yue Wu
Mengdi Wang
402
49
0
18 Oct 2024
Previous
1
2
3
...
5
6
7
...
10
11
12
Next