ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.16240
  4. Cited By
Beyond Reverse KL: Generalizing Direct Preference Optimization with
  Diverse Divergence Constraints

Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence Constraints

28 September 2023
Chaoqi Wang
Yibo Jiang
Yuguang Yang
Han Liu
Yuxin Chen
ArXivPDFHTML

Papers citing "Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence Constraints"

50 / 64 papers shown
Title
InfoPO: On Mutual Information Maximization for Large Language Model Alignment
InfoPO: On Mutual Information Maximization for Large Language Model Alignment
Teng Xiao
Zhen Ge
Sujay Sanghavi
Tian Wang
Julian Katz-Samuels
Marc Versage
Qingjun Cui
Trishul M. Chilimbi
12
0
0
13 May 2025
Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation
Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation
Stefan Vasilev
Christian Herold
Baohao Liao
Seyyed Hadi Hashemi
Shahram Khadivi
Christof Monz
MU
45
0
0
09 May 2025
Policy-labeled Preference Learning: Is Preference Enough for RLHF?
Policy-labeled Preference Learning: Is Preference Enough for RLHF?
Taehyun Cho
Seokhun Ju
Seungyub Han
Dohyeong Kim
Kyungjae Lee
Jungwoo Lee
OffRL
19
0
0
06 May 2025
Target Concrete Score Matching: A Holistic Framework for Discrete Diffusion
Target Concrete Score Matching: A Holistic Framework for Discrete Diffusion
Ruixiang Zhang
Shuangfei Zhai
Yizhe Zhang
James Thornton
Zijing Ou
Joshua M. Susskind
Navdeep Jaitly
DiffM
30
0
0
23 Apr 2025
Algorithm Discovery With LLMs: Evolutionary Search Meets Reinforcement Learning
Algorithm Discovery With LLMs: Evolutionary Search Meets Reinforcement Learning
Anja Surina
Amin Mansouri
Lars Quaedvlieg
Amal Seddas
Maryna Viazovska
Emmanuel Abbe
Çağlar Gülçehre
28
0
0
07 Apr 2025
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
Kai Ye
Hongyi Zhou
Jin Zhu
Francesco Quinzan
C. Shi
20
0
0
03 Apr 2025
Direct Post-Training Preference Alignment for Multi-Agent Motion Generation Models Using Implicit Feedback from Pre-training Demonstrations
Direct Post-Training Preference Alignment for Multi-Agent Motion Generation Models Using Implicit Feedback from Pre-training Demonstrations
Ran Tian
Kratarth Goel
36
0
0
25 Mar 2025
RankPO: Preference Optimization for Job-Talent Matching
Y. Zhang
M. Wang
Yu Wang
Xiaohui Wang
38
0
0
13 Mar 2025
RePO: ReLU-based Preference Optimization
Junkang Wu
Kexin Huang
Xue Wang
Jinyang Gao
Bolin Ding
Jiancan Wu
Xiangnan He
X. Wang
69
0
0
10 Mar 2025
Preserving Cultural Identity with Context-Aware Translation Through Multi-Agent AI Systems
Mahfuz Ahmed Anik
Abdur Rahman
Azmine Toushik Wasi
Md Manjurul Ahsan
47
1
0
05 Mar 2025
Alchemist: Towards the Design of Efficient Online Continual Learning System
Yuyang Huang
Yuhan Liu
Haryadi S. Gunawi
Beibin Li
Changho Hwang
CLL
OnRL
98
0
0
03 Mar 2025
Cross-Lingual Transfer of Debiasing and Detoxification in Multilingual LLMs: An Extensive Investigation
Cross-Lingual Transfer of Debiasing and Detoxification in Multilingual LLMs: An Extensive Investigation
Vera Neplenbroek
Arianna Bisazza
Raquel Fernández
93
0
0
17 Feb 2025
Diverse Preference Optimization
Diverse Preference Optimization
Jack Lanchantin
Angelica Chen
S. Dhuliawala
Ping Yu
Jason Weston
Sainbayar Sukhbaatar
Ilia Kulikov
86
3
0
30 Jan 2025
Inverse-RLignment: Large Language Model Alignment from Demonstrations through Inverse Reinforcement Learning
Inverse-RLignment: Large Language Model Alignment from Demonstrations through Inverse Reinforcement Learning
Hao Sun
M. Schaar
76
14
0
28 Jan 2025
Disentangling Preference Representation and Text Generation for Efficient Individual Preference Alignment
Disentangling Preference Representation and Text Generation for Efficient Individual Preference Alignment
Jianfei Zhang
Jun Bai
B. Li
Yanmeng Wang
Rumei Li
Chenghua Lin
Wenge Rong
39
0
0
31 Dec 2024
Geometric-Averaged Preference Optimization for Soft Preference Labels
Geometric-Averaged Preference Optimization for Soft Preference Labels
Hiroki Furuta
Kuang-Huei Lee
Shixiang Shane Gu
Y. Matsuo
Aleksandra Faust
Heiga Zen
Izzeddin Gur
46
6
0
31 Dec 2024
Reward Modeling with Ordinal Feedback: Wisdom of the Crowd
Reward Modeling with Ordinal Feedback: Wisdom of the Crowd
Shang Liu
Yu Pan
Guanting Chen
Xiaocheng Li
72
0
0
19 Nov 2024
Chain of Alignment: Integrating Public Will with Expert Intelligence for
  Language Model Alignment
Chain of Alignment: Integrating Public Will with Expert Intelligence for Language Model Alignment
Andrew Konya
Aviv Ovadya
K. J. Kevin Feng
Quan Ze Chen
Lisa Schirch
Colin Irwin
Amy X. Zhang
ALM
44
0
0
15 Nov 2024
$f$-PO: Generalizing Preference Optimization with $f$-divergence Minimization
fff-PO: Generalizing Preference Optimization with fff-divergence Minimization
Jiaqi Han
Mingjian Jiang
Yuxuan Song
J. Leskovec
Stefano Ermon
43
3
0
29 Oct 2024
Improving Inverse Folding for Peptide Design with Diversity-regularized
  Direct Preference Optimization
Improving Inverse Folding for Peptide Design with Diversity-regularized Direct Preference Optimization
Ryan Park
Darren J. Hsu
C. Brian Roland
Maria Korshunova
Chen Tessler
Shie Mannor
Olivia Viessmann
Bruno Trentini
19
1
0
25 Oct 2024
Preference Optimization with Multi-Sample Comparisons
Preference Optimization with Multi-Sample Comparisons
Chaoqi Wang
Zhuokai Zhao
Chen Zhu
Karthik Abinav Sankararaman
Michal Valko
...
Zhaorun Chen
Madian Khabsa
Yuxin Chen
Hao Ma
Sinong Wang
53
10
0
16 Oct 2024
How to Leverage Demonstration Data in Alignment for Large Language
  Model? A Self-Imitation Learning Perspective
How to Leverage Demonstration Data in Alignment for Large Language Model? A Self-Imitation Learning Perspective
Teng Xiao
Mingxiao Li
Yige Yuan
Huaisheng Zhu
Chao Cui
V. Honavar
ALM
26
7
0
14 Oct 2024
Beyond Squared Error: Exploring Loss Design for Enhanced Training of
  Generative Flow Networks
Beyond Squared Error: Exploring Loss Design for Enhanced Training of Generative Flow Networks
Rui Hu
Yifan Zhang
Zhuoran Li
Longbo Huang
27
0
0
03 Oct 2024
Self-supervised Preference Optimization: Enhance Your Language Model
  with Preference Degree Awareness
Self-supervised Preference Optimization: Enhance Your Language Model with Preference Degree Awareness
Jian Li
Haojing Huang
Yujia Zhang
Pengfei Xu
Xi Chen
Rui Song
Lida Shi
Jingwen Wang
Hao Xu
13
0
0
26 Sep 2024
Orthogonal Finetuning for Direct Preference Optimization
Orthogonal Finetuning for Direct Preference Optimization
Chenxu Yang
Ruipeng Jia
Naibin Gu
Zheng-Shen Lin
Siyuan Chen
Chao Pang
Weichong Yin
Yu Sun
Hua-Hong Wu
Weiping Wang
17
0
0
23 Sep 2024
Generalizing Alignment Paradigm of Text-to-Image Generation with
  Preferences through $f$-divergence Minimization
Generalizing Alignment Paradigm of Text-to-Image Generation with Preferences through fff-divergence Minimization
Haoyuan Sun
Bo Xia
Yongzhe Chang
Xueqian Wang
EGVM
26
2
0
15 Sep 2024
Alignment of Diffusion Models: Fundamentals, Challenges, and Future
Alignment of Diffusion Models: Fundamentals, Challenges, and Future
Buhua Liu
Shitong Shao
Bao Li
Lichen Bai
Zhiqiang Xu
Haoyi Xiong
James Kwok
Sumi Helal
Zeke Xie
37
11
0
11 Sep 2024
Forward KL Regularized Preference Optimization for Aligning Diffusion
  Policies
Forward KL Regularized Preference Optimization for Aligning Diffusion Policies
Zhao Shan
Chenyou Fan
Shuang Qiu
Jiyuan Shi
Chenjia Bai
27
3
0
09 Sep 2024
Towards a Unified View of Preference Learning for Large Language Models:
  A Survey
Towards a Unified View of Preference Learning for Large Language Models: A Survey
Bofei Gao
Feifan Song
Yibo Miao
Zefan Cai
Z. Yang
...
Houfeng Wang
Zhifang Sui
Peiyi Wang
Baobao Chang
Baobao Chang
41
11
0
04 Sep 2024
A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO,
  DPO and More
A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More
Zhichao Wang
Bin Bi
Shiva K. Pentyala
Kiran Ramnath
Sougata Chaudhuri
...
Z. Zhu
Xiang-Bo Mao
S. Asur
Na
Na Cheng
OffRL
26
38
0
23 Jul 2024
New Desiderata for Direct Preference Optimization
New Desiderata for Direct Preference Optimization
Xiangkun Hu
Tong He
David Wipf
44
2
0
12 Jul 2024
Decoding-Time Language Model Alignment with Multiple Objectives
Decoding-Time Language Model Alignment with Multiple Objectives
Ruizhe Shi
Yifang Chen
Yushi Hu
Alisa Liu
Hannaneh Hajishirzi
Noah A. Smith
Simon Du
41
30
0
27 Jun 2024
Model Merging and Safety Alignment: One Bad Model Spoils the Bunch
Model Merging and Safety Alignment: One Bad Model Spoils the Bunch
Hasan Hammoud
Umberto Michieli
Fabio Pizzati
Philip H. S. Torr
Adel Bibi
Bernard Ghanem
Mete Ozay
MoMe
26
14
0
20 Jun 2024
A Survey on Human Preference Learning for Large Language Models
A Survey on Human Preference Learning for Large Language Models
Ruili Jiang
Kehai Chen
Xuefeng Bai
Zhixuan He
Juntao Li
Muyun Yang
Tiejun Zhao
Liqiang Nie
Min Zhang
39
8
0
17 Jun 2024
On Softmax Direct Preference Optimization for Recommendation
On Softmax Direct Preference Optimization for Recommendation
Yuxin Chen
Junfei Tan
An Zhang
Zhengyi Yang
Leheng Sheng
Enzhi Zhang
Xiang Wang
Tat-Seng Chua
26
23
0
13 Jun 2024
Information Theoretic Guarantees For Policy Alignment In Large Language
  Models
Information Theoretic Guarantees For Policy Alignment In Large Language Models
Youssef Mroueh
29
6
0
09 Jun 2024
Self-Improving Robust Preference Optimization
Self-Improving Robust Preference Optimization
Eugene Choi
Arash Ahmadian
Matthieu Geist
Oilvier Pietquin
M. G. Azar
20
8
0
03 Jun 2024
Direct Alignment of Language Models via Quality-Aware Self-Refinement
Direct Alignment of Language Models via Quality-Aware Self-Refinement
Runsheng Yu
Yong Wang
Xiaoqi Jiao
Youzhi Zhang
James T. Kwok
48
7
0
31 May 2024
Offline Regularised Reinforcement Learning for Large Language Models
  Alignment
Offline Regularised Reinforcement Learning for Large Language Models Alignment
Pierre Harvey Richemond
Yunhao Tang
Daniel Guo
Daniele Calandriello
M. G. Azar
...
Gil Shamir
Rishabh Joshi
Tianqi Liu
Rémi Munos
Bilal Piot
OffRL
34
21
0
29 May 2024
Prompt Optimization with Human Feedback
Prompt Optimization with Human Feedback
Xiaoqiang Lin
Zhongxiang Dai
Arun Verma
See-Kiong Ng
P. Jaillet
K. H. Low
AAML
23
8
0
27 May 2024
On the Algorithmic Bias of Aligning Large Language Models with RLHF:
  Preference Collapse and Matching Regularization
On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization
Jiancong Xiao
Ziniu Li
Xingyu Xie
E. Getzen
Cong Fang
Qi Long
Weijie J. Su
36
6
0
26 May 2024
Hummer: Towards Limited Competitive Preference Dataset
Hummer: Towards Limited Competitive Preference Dataset
Li Jiang
Yusen Wu
Junwu Xiong
Jingqing Ruan
Yichuan Ding
Qingpei Guo
Zujie Wen
Jun Zhou
Xiaotie Deng
20
6
0
19 May 2024
Filtered Direct Preference Optimization
Filtered Direct Preference Optimization
Tetsuro Morimura
Mitsuki Sakamoto
Yuu Jinnai
Kenshi Abe
Kaito Air
26
13
0
22 Apr 2024
Token-level Direct Preference Optimization
Token-level Direct Preference Optimization
Yongcheng Zeng
Guoqing Liu
Weiyu Ma
Ning Yang
Haifeng Zhang
Jun Wang
18
42
0
18 Apr 2024
Stepwise Alignment for Constrained Language Model Policy Optimization
Stepwise Alignment for Constrained Language Model Policy Optimization
Akifumi Wachi
Thien Q. Tran
Rei Sato
Takumi Tanabe
Yohei Akimoto
34
5
0
17 Apr 2024
Learn Your Reference Model for Real Good Alignment
Learn Your Reference Model for Real Good Alignment
Alexey Gorbatovski
Boris Shaposhnikov
Alexey Malakhov
Nikita Surnachev
Yaroslav Aksenov
Ian Maksimov
Nikita Balagansky
Daniil Gavrilov
OffRL
45
25
0
15 Apr 2024
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
Yikang Shen
Zhen Guo
Tianle Cai
Zengyi Qin
MoE
ALM
27
26
0
11 Apr 2024
Towards Analyzing and Understanding the Limitations of DPO: A
  Theoretical Perspective
Towards Analyzing and Understanding the Limitations of DPO: A Theoretical Perspective
Duanyu Feng
Bowen Qin
Chen Huang
Zheng-Wei Zhang
Wenqiang Lei
21
29
0
06 Apr 2024
ROPO: Robust Preference Optimization for Large Language Models
ROPO: Robust Preference Optimization for Large Language Models
Xize Liang
Chao Chen
Shuang Qiu
Jie Wang
Yue-bo Wu
Zhihang Fu
Zhihao Shi
Feng Wu
Jieping Ye
37
1
0
05 Apr 2024
Rethinking Kullback-Leibler Divergence in Knowledge Distillation for
  Large Language Models
Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models
Taiqiang Wu
Chaofan Tao
Jiahao Wang
Zhe Zhao
Ngai Wong
ALM
33
14
0
03 Apr 2024
12
Next