Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2403.07691
Cited By
v1
v2 (latest)
ORPO: Monolithic Preference Optimization without Reference Model
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
12 March 2024
Jiwoo Hong
Noah Lee
James Thorne
OSLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (67 upvotes)
Papers citing
"ORPO: Monolithic Preference Optimization without Reference Model"
50 / 252 papers shown
Title
Beyond Scalar Reward Model: Learning Generative Judge from Preference Data
Ziyi Ye
Xiangsheng Li
Qiuchi Li
Jiaxin Mao
Yujia Zhou
Wei Shen
Dong Yan
Yiqun Liu
226
32
0
01 Oct 2024
Evaluation of Large Language Models for Summarization Tasks in the Medical Domain: A Narrative Review
Emma Croxford
Yanjun Gao
Nicholas Pellegrino
Karen K. Wong
Graham Wills
Elliot First
Frank J. Liao
Cherodeep Goswami
Brian Patterson
Majid Afshar
HILM
ELM
LM&MA
317
4
0
26 Sep 2024
Self-supervised Preference Optimization: Enhance Your Language Model with Preference Degree Awareness
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Jian Li
Haojing Huang
Yujia Zhang
Pengfei Xu
Xi Chen
Rui Song
Lida Shi
Jingwen Wang
Hao Xu
120
2
0
26 Sep 2024
Modulated Intervention Preference Optimization (MIPO): Keep the Easy, Refine the Difficult
Cheolhun Jang
248
0
0
26 Sep 2024
Just Say What You Want: Only-prompting Self-rewarding Online Preference Optimization
Ruijie Xu
Zhihan Liu
Yongfei Liu
Shipeng Yan
Zhaoran Wang
Zhi-Li Zhang
Xuming He
ALM
221
1
0
26 Sep 2024
Aligning Language Models Using Follow-up Likelihood as Reward Signal
AAAI Conference on Artificial Intelligence (AAAI), 2024
Chen Zhang
Dading Chong
Feng Jiang
Chengguang Tang
Anningzhe Gao
Guohua Tang
Haizhou Li
ALM
259
6
0
20 Sep 2024
CamelEval: Advancing Culturally Aligned Arabic Language Models and Benchmarks
Zhaozhi Qian
Faroq Altam
Muhammad Alqurishi
Riad Souissi
158
11
0
19 Sep 2024
From Lists to Emojis: How Format Bias Affects Model Alignment
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Xuanchang Zhang
Wei Xiong
Lichang Chen
Wanrong Zhu
Heng Huang
Tong Zhang
ALM
418
19
0
18 Sep 2024
KodeXv0.1: A Family of State-of-the-Art Financial Large Language Models
Neel Rajani
Lilli Kiessling
Aleksandr Ogaltsov
Claus Lang
ALM
131
0
0
13 Sep 2024
AIPO: Improving Training Objective for Iterative Preference Optimization
Yaojie Shen
Xinyao Wang
Yulei Niu
Ying Zhou
Lexin Tang
Libo Zhang
Fan Chen
Longyin Wen
249
2
0
13 Sep 2024
Propaganda is all you need
Paul Kronlund-Drouault
222
1
0
13 Sep 2024
Towards a Unified View of Preference Learning for Large Language Models: A Survey
Bofei Gao
Feifan Song
Yibo Miao
Zefan Cai
Zhiyong Yang
...
Houfeng Wang
Zhifang Sui
Peiyi Wang
Baobao Chang
Baobao Chang
396
16
0
04 Sep 2024
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models
Dian Yu
Baolin Peng
Ye Tian
Linfeng Song
Haitao Mi
Dong Yu
ALM
LRM
176
4
0
28 Aug 2024
RoVRM: A Robust Visual Reward Model Optimized via Auxiliary Textual Preference Data
Chenglong Wang
Yang Gan
Yifu Huo
Yongyu Mu
Murun Yang
...
Chunliang Zhang
Tongran Liu
Quan Du
Di Yang
Jingbo Zhu
VLM
370
11
0
22 Aug 2024
Value Alignment from Unstructured Text
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Inkit Padhi
Karthikeyan N. Ramamurthy
P. Sattigeri
Manish Nagireddy
Pierre Dognin
Kush R. Varshney
203
0
0
19 Aug 2024
Minor DPO reject penalty to increase training robustness
Shiming Xie
Hong Chen
Fred Yu
Zeye Sun
Xiuyu Wu
Yingfan Hu
182
5
0
19 Aug 2024
The Fellowship of the LLMs: Multi-Agent Workflows for Synthetic Preference Optimization Dataset Generation
Samee Arif
Sualeha Farid
Abdul Hameed Azeemi
Awais Athar
Agha Ali Raza
LLMAG
394
10
0
16 Aug 2024
Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization
International Conference on Learning Representations (ICLR), 2024
Yuxin Jiang
Bo Huang
Yufei Wang
Xingshan Zeng
Liangyou Li
Yasheng Wang
Xin Jiang
Lifeng Shang
Ruiming Tang
Wei Wang
268
4
0
14 Aug 2024
Exploring Applications of State Space Models and Advanced Training Techniques in Sequential Recommendations: A Comparative Study on Efficiency and Performance
M. Obozov
Makar Baderko
Stepan Kulibaba
N. Kutuzov
Alexander Gasnikov
Mamba
OffRL
295
0
0
10 Aug 2024
Towards Explainable Network Intrusion Detection using Large Language Models
Paul R. B. Houssel
Priyanka Singh
S. Layeghy
Marius Portmann
161
16
0
08 Aug 2024
ABC Align: Large Language Model Alignment for Safety & Accuracy
Gareth Seneque
Lap-Hang Ho
Peter W. Glynn
Yinyu Ye
Jeffrey Molendijk
166
1
0
01 Aug 2024
ALLaM: Large Language Models for Arabic and English
M Saiful Bari
Yazeed Alnumay
Norah A. Alzahrani
Nouf M. Alotaibi
H. A. Alyahya
...
Jeril Kuriakose
Abdalghani Abujabal
Nora Al-Twairesh
Areeb Alowisheq
Haidar Khan
204
42
0
22 Jul 2024
Weak-to-Strong Reasoning
Yuqing Yang
Yan Ma
Pengfei Liu
LRM
278
29
0
18 Jul 2024
Research on Tibetan Tourism Viewpoints information generation system based on LLM
Jinhu Qi
Shuai Yan
Wentao Zhang
Yibo Zhang
Zirui Liu
Ke Wang
181
2
0
18 Jul 2024
New Desiderata for Direct Preference Optimization
Xiangkun Hu
Tong He
David Wipf
155
5
0
12 Jul 2024
LIONs: An Empirically Optimized Approach to Align Language Models
Xiao Yu
Qingyang Wu
Yu Li
Zhou Yu
ALM
225
6
0
09 Jul 2024
Suri: Multi-constraint Instruction Following for Long-form Text Generation
Chau Minh Pham
Simeng Sun
Mohit Iyyer
ALM
LRM
265
35
0
27 Jun 2024
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs
Xin Lai
Zhuotao Tian
Yukang Chen
Senqiao Yang
Xiangru Peng
Jiaya Jia
LRM
370
205
0
26 Jun 2024
PAFT: A Parallel Training Paradigm for Effective LLM Fine-Tuning
Shiva K. Pentyala
Zhichao Wang
Bin Bi
Kiran Ramnath
Xiang-Bo Mao
Regunathan Radhakrishnan
S. Asur
Na
Cheng
MoMe
230
12
0
25 Jun 2024
PORT: Preference Optimization on Reasoning Traces
Salem Lahlou
Abdalgader Abubaker
Hakim Hacid
LRM
303
7
0
23 Jun 2024
A Tale of Trust and Accuracy: Base vs. Instruct LLMs in RAG Systems
Florin Cuconasu
Giovanni Trappolini
Nicola Tonellotto
Fabrizio Silvestri
188
4
0
21 Jun 2024
Aligning Large Language Models with Diverse Political Viewpoints
Dominik Stammbach
Philine Widmer
Eunjung Cho
Çağlar Gülçehre
Elliott Ash
230
6
0
20 Jun 2024
Low-Redundant Optimization for Large Language Model Alignment
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Zhipeng Chen
Kun Zhou
Wayne Xin Zhao
Jingyuan Wang
Ji-Rong Wen
206
0
0
18 Jun 2024
Code-Optimise: Self-Generated Preference Data for Correctness and Efficiency
Leonidas Gee
Milan Gritta
Gerasimos Lampouras
Ignacio Iacobacci
295
13
0
18 Jun 2024
WPO: Enhancing RLHF with Weighted Preference Optimization
Wenxuan Zhou
Ravi Agrawal
Shujian Zhang
Sathish Indurthi
Sanqiang Zhao
Kaiqiang Song
Silei Xu
Chenguang Zhu
269
35
0
17 Jun 2024
A Survey on Human Preference Learning for Large Language Models
Ruili Jiang
Kehai Chen
Xuefeng Bai
Zhixuan He
Juntao Li
Muyun Yang
Tiejun Zhao
Liqiang Nie
Min Zhang
267
16
0
17 Jun 2024
Step-level Value Preference Optimization for Mathematical Reasoning
Guoxin Chen
Minpeng Liao
Chengxi Li
Kai Fan
LRM
199
64
0
16 Jun 2024
Diffusion-RPO: Aligning Diffusion Models through Relative Preference Optimization
Yi Gu
Zhendong Wang
Yueqin Yin
Yujia Xie
Mingyuan Zhou
212
30
0
10 Jun 2024
Margin-aware Preference Optimization for Aligning Diffusion Models without Reference
Jiwoo Hong
Sayak Paul
Noah Lee
Kashif Rasul
James Thorne
Jongheon Jeong
254
29
0
10 Jun 2024
PrivacyRestore: Privacy-Preserving Inference in Large Language Models via Privacy Removal and Restoration
Huiping Zhuang
Jianwei Wang
Zhengdong Lu
Huiping Zhuang
Haoran Li
Huiping Zhuang
Cen Chen
RALM
KELM
592
14
0
03 Jun 2024
Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment
Keming Lu
Bowen Yu
Fei Huang
Yang Fan
Runji Lin
Chang Zhou
MoMe
185
26
0
28 May 2024
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Zhihan Liu
Miao Lu
Shenao Zhang
Boyi Liu
Hongyi Guo
Yingxiang Yang
Jose H. Blanchet
Zhaoran Wang
335
84
0
26 May 2024
SimPO: Simple Preference Optimization with a Reference-Free Reward
Neural Information Processing Systems (NeurIPS), 2024
Yu Meng
Mengzhou Xia
Danqi Chen
489
761
0
23 May 2024
360Zhinao Technical Report
360Zhinao Team
205
0
0
22 May 2024
(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts
Minghao Wu
Jiahao Xu
Yulin Yuan
Gholamreza Haffari
Longyue Wang
Weihua Luo
Kaifu Zhang
LLMAG
561
43
0
20 May 2024
Advanced Natural-based interaction for the ITAlian language: LLaMAntino-3-ANITA
Marco Polignano
Pierpaolo Basile
Giovanni Semeraro
212
34
0
11 May 2024
D2PO: Discriminator-Guided DPO with Response Evaluation Models
Prasann Singhal
Nathan Lambert
S. Niekum
Tanya Goyal
Greg Durrett
OffRL
EGVM
183
7
0
02 May 2024
Self-Explore to Avoid the Pit: Improving the Reasoning Capabilities of Language Models with Fine-grained Rewards
Hyeonbin Hwang
Doyoung Kim
Seungone Kim
Seonghyeon Ye
Minjoon Seo
LRM
ReLM
309
28
0
16 Apr 2024
Regularized Best-of-N Sampling with Minimum Bayes Risk Objective for Language Model Alignment
Yuu Jinnai
Tetsuro Morimura
Kaito Ariu
Kenshi Abe
407
9
0
01 Apr 2024
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Yaowei Zheng
Richong Zhang
Junhao Zhang
Yanhan Ye
Zheyan Luo
Zhangchi Feng
Yongqiang Ma
612
1,102
0
20 Mar 2024
Previous
1
2
3
4
5
6
Next