Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.12036
Cited By
A General Theoretical Paradigm to Understand Learning from Human Preferences
18 October 2023
M. G. Azar
Mark Rowland
Bilal Piot
Daniel Guo
Daniele Calandriello
Michal Valko
Rémi Munos
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A General Theoretical Paradigm to Understand Learning from Human Preferences"
50 / 415 papers shown
Title
A Gradient Analysis Framework for Rewarding Good and Penalizing Bad Examples in Language Models
Yi-Lin Tuan
William Yang Wang
21
1
0
29 Aug 2024
Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
Wenxuan Zhang
Philip H. S. Torr
Mohamed Elhoseiny
Adel Bibi
69
9
0
27 Aug 2024
Selective Preference Optimization via Token-Level Reward Function Estimation
Kailai Yang
Zhiwei Liu
Qianqian Xie
Jimin Huang
Erxue Min
Sophia Ananiadou
31
10
0
24 Aug 2024
Aligning (Medical) LLMs for (Counterfactual) Fairness
Raphael Poulain
Hamed Fayyaz
Rahmatollah Beheshti
29
3
0
22 Aug 2024
Personality Alignment of Large Language Models
Minjun Zhu
Linyi Yang
Yue Zhang
Yue Zhang
ALM
57
5
0
21 Aug 2024
Minor SFT loss for LLM fine-tune to increase performance and reduce model deviation
Shiming Xie
Hong Chen
Fred Yu
Zeye Sun
Xiuyu Wu
27
0
0
20 Aug 2024
CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing Hallucinations in LVLMs
Yassine Ouali
Adrian Bulat
Brais Martínez
Georgios Tzimiropoulos
VLM
MLLM
32
18
0
19 Aug 2024
Minor DPO reject penalty to increase training robustness
Shiming Xie
Hong Chen
Fred Yu
Zeye Sun
Xiuyu Wu
Yingfan Hu
35
2
0
19 Aug 2024
Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization
Yuxin Jiang
Bo Huang
Yufei Wang
Xingshan Zeng
Liangyou Li
Yasheng Wang
Xin Jiang
Lifeng Shang
Ruiming Tang
Wei Wang
42
5
0
14 Aug 2024
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
Karel DÓosterlinck
Winnie Xu
Chris Develder
Thomas Demeester
A. Singh
Christopher Potts
Douwe Kiela
Shikib Mehri
33
10
0
12 Aug 2024
Unlocking Decoding-time Controllability: Gradient-Free Multi-Objective Alignment with Contrastive Prompts
Tingchen Fu
Yupeng Hou
Julian McAuley
Rui Yan
30
3
0
09 Aug 2024
On the Generalization of Preference Learning with DPO
Shawn Im
Yixuan Li
44
1
0
06 Aug 2024
Apple Intelligence Foundation Language Models
Tom Gunter
Zirui Wang
Chong-Jun Wang
Ruoming Pang
Andy Narayanan
...
Xinwen Liu
Yang Zhao
Yin Xia
Zhile Ren
Zhongzheng Ren
35
32
0
29 Jul 2024
Right Now, Wrong Then: Non-Stationary Direct Preference Optimization under Preference Drift
Seongho Son
William Bankes
Sayak Ray Chowdhury
Brooks Paige
Ilija Bogunovic
34
4
0
26 Jul 2024
A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More
Zhichao Wang
Bin Bi
Shiva K. Pentyala
Kiran Ramnath
Sougata Chaudhuri
...
Z. Zhu
Xiang-Bo Mao
S. Asur
Na
Na Cheng
OffRL
34
39
0
23 Jul 2024
ALLaM: Large Language Models for Arabic and English
M Saiful Bari
Yazeed Alnumay
Norah A. Alzahrani
Nouf M. Alotaibi
H. A. Alyahya
...
Jeril Kuriakose
Abdalghani Abujabal
Nora Al-Twairesh
Areeb Alowisheq
Haidar Khan
34
11
0
22 Jul 2024
Data-Centric Human Preference Optimization with Rationales
H. Just
Ming Jin
Anit Kumar Sahu
Huy Phan
Ruoxi Jia
39
3
0
19 Jul 2024
Clinical Reading Comprehension with Encoder-Decoder Models Enhanced by Direct Preference Optimization
Md Sultan al Nahian
R. Kavuluru
MedIm
AI4CE
31
0
0
19 Jul 2024
BiasDPO: Mitigating Bias in Language Models through Direct Preference Optimization
Ahmed Allam
38
8
0
18 Jul 2024
Understanding Reference Policies in Direct Preference Optimization
Yixin Liu
Pengfei Liu
Arman Cohan
31
7
0
18 Jul 2024
New Desiderata for Direct Preference Optimization
Xiangkun Hu
Tong He
David Wipf
44
2
0
12 Jul 2024
Self-training Language Models for Arithmetic Reasoning
Marek Kadlcík
Michal Štefánik
KELM
ReLM
OffRL
LRM
35
1
0
11 Jul 2024
PEER: Expertizing Domain-Specific Tasks with a Multi-Agent Framework and Tuning Methods
Yiying Wang
Xiaojing Li
Binzhu Wang
Yueyang Zhou
Yingru Lin
...
Fei Yu
Zewei Zhao
Song Jin
Renji Gong
Wanqing Xu
27
1
0
09 Jul 2024
On the Limitations of Compute Thresholds as a Governance Strategy
Sara Hooker
47
14
0
08 Jul 2024
Variational Best-of-N Alignment
Afra Amini
Tim Vieira
Ryan Cotterell
Ryan Cotterell
BDL
43
18
0
08 Jul 2024
AI Safety in Generative AI Large Language Models: A Survey
Jaymari Chua
Yun Yvonna Li
Shiyi Yang
Chen Wang
Lina Yao
LM&MA
34
12
0
06 Jul 2024
HAF-RM: A Hybrid Alignment Framework for Reward Model Training
Shujun Liu
Xiaoyu Shen
Yuhang Lai
Siyuan Wang
Shengbin Yue
Zengfeng Huang
Xuanjing Huang
Zhongyu Wei
26
1
0
04 Jul 2024
Orchestrating LLMs with Different Personalizations
Jin Peng Zhou
Katie Z Luo
Jingwen Gu
Jason Yuan
Kilian Q. Weinberger
Wen Sun
49
2
0
04 Jul 2024
Understanding Alignment in Multimodal LLMs: A Comprehensive Study
Elmira Amirloo
J. Fauconnier
Christoph Roesmann
Christian Kerl
Rinu Boney
...
Zirui Wang
Afshin Dehghan
Yinfei Yang
Zhe Gan
Peter Grasch
35
6
0
02 Jul 2024
RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs
John Dang
Arash Ahmadian
Kelly Marchisio
Julia Kreutzer
A. Ustun
Sara Hooker
37
21
0
02 Jul 2024
Aligning Target-Aware Molecule Diffusion Models with Exact Energy Optimization
Siyi Gu
Minkai Xu
Alexander Powers
Weili Nie
Tomas Geffner
Karsten Kreis
J. Leskovec
Arash Vahdat
Stefano Ermon
43
7
0
01 Jul 2024
Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning
Zimu Lu
Aojun Zhou
Ke Wang
Houxing Ren
Weikang Shi
Junting Pan
Mingjie Zhan
Hongsheng Li
LRM
37
22
0
30 Jun 2024
BAPO: Base-Anchored Preference Optimization for Personalized Alignment in Large Language Models
Gihun Lee
Minchan Jeong
Yujin Kim
Hojung Jung
Jaehoon Oh
Sangmook Kim
Se-Young Yun
24
1
0
30 Jun 2024
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning
Yuheng Zhang
Dian Yu
Baolin Peng
Linfeng Song
Ye Tian
Mingyue Huo
Nan Jiang
Haitao Mi
Dong Yu
35
14
0
30 Jun 2024
STLLaVA-Med: Self-Training Large Language and Vision Assistant for Medical
Guohao Sun
Can Qin
Huazhu Fu
Linwei Wang
Zhiqiang Tao
LM&MA
32
3
0
28 Jun 2024
Direct Preference Knowledge Distillation for Large Language Models
Yixing Li
Yuxian Gu
Li Dong
Dequan Wang
Yu Cheng
Furu Wei
37
6
0
28 Jun 2024
Averaging log-likelihoods in direct alignment
Nathan Grinsztajn
Yannis Flet-Berliac
M. G. Azar
Florian Strub
Bill Wu
...
Chris Cremer
Arash Ahmadian
Yash Chandak
Olivier Pietquin
Matthieu Geist
MoMe
37
4
0
27 Jun 2024
Alignment For Performance Improvement in Conversation Bots
Raghav Garg
Kapil Sharma
Shrey Singla
29
0
0
27 Jun 2024
Decoding-Time Language Model Alignment with Multiple Objectives
Ruizhe Shi
Yifang Chen
Yushi Hu
Alisa Liu
Hannaneh Hajishirzi
Noah A. Smith
Simon Du
44
30
0
27 Jun 2024
Encourage or Inhibit Monosemanticity? Revisit Monosemanticity from a Feature Decorrelation Perspective
Hanqi Yan
Yanzheng Xiang
Guangyi Chen
Yifei Wang
Lin Gui
Yulan He
37
5
0
25 Jun 2024
Not All Preference Pairs Are Created Equal: A Recipe for Annotation-Efficient Iterative Preference Learning
Sen Yang
Leyang Cui
Deng Cai
Xinting Huang
Shuming Shi
Wai Lam
38
8
0
25 Jun 2024
WARP: On the Benefits of Weight Averaged Rewarded Policies
Alexandre Ramé
Johan Ferret
Nino Vieillard
Robert Dadashi
Léonard Hussenot
Pierre-Louis Cedoz
Pier Giuseppe Sessa
Sertan Girgin
Arthur Douillard
Olivier Bachem
54
13
0
24 Jun 2024
On the Transformations across Reward Model, Parameter Update, and In-Context Prompt
Deng Cai
Huayang Li
Tingchen Fu
Siheng Li
Weiwen Xu
...
Leyang Cui
Yan Wang
Lemao Liu
Taro Watanabe
Shuming Shi
KELM
26
2
0
24 Jun 2024
PORT: Preference Optimization on Reasoning Traces
Salem Lahlou
Abdalgader Abubaker
Hakim Hacid
LRM
33
1
0
23 Jun 2024
Robust Reinforcement Learning from Corrupted Human Feedback
Alexander Bukharin
Ilgee Hong
Haoming Jiang
Zichong Li
Qingru Zhang
Zixuan Zhang
Tuo Zhao
39
4
0
21 Jun 2024
Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts
Haoxiang Wang
Wei Xiong
Tengyang Xie
Han Zhao
Tong Zhang
46
135
0
18 Jun 2024
Low-Redundant Optimization for Large Language Model Alignment
Zhipeng Chen
Kun Zhou
Wayne Xin Zhao
Jingyuan Wang
Ji-Rong Wen
34
2
0
18 Jun 2024
Beyond Under-Alignment: Atomic Preference Enhanced Factuality Tuning for Large Language Models
Hongbang Yuan
Yubo Chen
Pengfei Cao
Zhuoran Jin
Kang Liu
Jun Zhao
36
0
0
18 Jun 2024
Code-Optimise: Self-Generated Preference Data for Correctness and Efficiency
Leonidas Gee
Milan Gritta
Gerasimos Lampouras
Ignacio Iacobacci
26
10
0
18 Jun 2024
mDPO: Conditional Preference Optimization for Multimodal Large Language Models
Fei Wang
Wenxuan Zhou
James Y. Huang
Nan Xu
Sheng Zhang
Hoifung Poon
Muhao Chen
62
15
0
17 Jun 2024
Previous
1
2
3
4
5
6
7
8
9
Next