Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2403.07691
Cited By
v1
v2 (latest)
ORPO: Monolithic Preference Optimization without Reference Model
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
12 March 2024
Jiwoo Hong
Noah Lee
James Thorne
OSLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (67 upvotes)
Papers citing
"ORPO: Monolithic Preference Optimization without Reference Model"
50 / 253 papers shown
When Human Preferences Flip: An Instance-Dependent Robust Loss for RLHF
Yifan Xu
Xichen Ye
Yifan Chen
Qiaosheng Zhang
64
0
0
30 Nov 2025
Ambiguity Awareness Optimization: Towards Semantic Disambiguation for Direct Preference Optimization
Jian Li
Shenglin Yin
Yujia Zhang
Alan Zhao
Xi Chen
Xiaohui Zhou
Pengfei Xu
93
0
0
28 Nov 2025
Towards Continuous Intelligence Growth: Self-Training, Continual Learning, and Dual-Scale Memory in SuperIntelliAgent
Jianzhe Lin
Zeyu Pan
Yun Zhu
Ruiqi Song
Jining Yang
LRM
117
0
0
28 Nov 2025
Real-Time Long Horizon Air Quality Forecasting via Group-Relative Policy Optimization
Inha Kang
Eunki Kim
Wonjeong Ryu
Jaeyo Shin
Seungjun Yu
Yoon-Hee Kang
Seongeun Jeong
Eunhye Kim
Soontae Kim
Hyunjung Shim
51
0
0
27 Nov 2025
STORE: Semantic Tokenization, Orthogonal Rotation and Efficient Attention for Scaling Up Ranking Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Y. Xu
Chaofan Fan
J. Hu
Yu Zhang
Zeng Xiaoyi
J. Zhang
157
1
0
24 Nov 2025
The PLLuM Instruction Corpus
Piotr Pęzik
Filip Żarnecki
Konrad Kaczyñski
A. Cichosz
Zuzanna Deckert
...
Konrad Wojtasik
Arkadiusz Janz
P. Kazienko
Julia Moska
Jan Kocoñ
100
0
0
21 Nov 2025
Fine-Tuned LLMs Know They Don't Know: A Parameter-Efficient Approach to Recovering Honesty
Zeyu Shi
Ziming Wang
Tianyu Chen
Shiqi Gao
Haoyi Zhou
Qingyun Sun
Jianxin Li
89
0
0
17 Nov 2025
VULPO: Context-Aware Vulnerability Detection via On-Policy LLM Optimization
Youpeng Li
Fuxun Yu
Xinda Wang
OffRL
221
0
0
14 Nov 2025
Textual Self-attention Network: Test-Time Preference Optimization through Textual Gradient-based Attention
Shibing Mo
Haoyang Ruan
Kai Wu
Jing Liu
214
0
0
10 Nov 2025
HLPD: Aligning LLMs to Human Language Preference for Machine-Revised Text Detection
Fangqi Dai
Xingjian Jiang
Zizhuang Deng
DeLMO
591
0
0
10 Nov 2025
Reg-DPO: SFT-Regularized Direct Preference Optimization with GT-Pair for Improving Video Generation
Jie Du
Xinyu Gong
Qingshan Tan
W. Li
Yangming Cheng
Weitao Wang
Chenlu Zhan
Suhui Wu
H. Zhang
J. Zhang
VGen
354
0
0
03 Nov 2025
Prompt-R1: Collaborative Automatic Prompting Framework via End-to-end Reinforcement Learning
Wenjin Liu
Haoran Luo
X. Lin
Haoming Liu
Tiesunlong Shen
Jiapu Wang
Rui Mao
Erik Cambria
LLMAG
OffRL
LRM
444
1
0
02 Nov 2025
BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning
Qianli Shen
Daoyuan Chen
Yilun Huang
Zhenqing Ling
Yaliang Li
Bolin Ding
Jingren Zhou
OffRL
168
0
0
30 Oct 2025
Learning "Partner-Aware" Collaborators in Multi-Party Collaboration
Abhijnan Nath
Nikhil Krishnaswamy
121
0
0
26 Oct 2025
Beyond Pairwise: Empowering LLM Alignment With Ranked Choice Modeling
Yuxuan Tang
Yifan Feng
104
0
0
24 Oct 2025
OG-Rank: Learning to Rank Fast and Slow with Uncertainty and Reward-Trend Guided Adaptive Exploration
Praphul Singh
Corey D Barrett
Sumana Srivasta
Irfan Bulu
Sri Gadde
Krishnaram Kenthapadi
OffRL
BDL
CML
163
0
0
20 Oct 2025
RL makes MLLMs see better than SFT
Junha Song
Sangdoo Yun
Dongyoon Han
Jaegul Choo
Byeongho Heo
OffRL
193
0
0
18 Oct 2025
POPI: Personalizing LLMs via Optimized Natural Language Preference Inference
Yizhuo Chen
Xin Liu
Ruijie Wang
Zheng Li
Pei Chen
Changlong Yu
Priyanka Nigam
Meng Jiang
Bing Yin
120
1
0
17 Oct 2025
Beyond Single-Reward: Multi-Pair, Multi-Perspective Preference Optimization for Machine Translation
Hao Wang
Linlong Xu
Heng Liu
Y. Liu
Xiaohu Zhao
Bo Zeng
Liangying Shao
Longyue Wang
Weihua Luo
Kaifu Zhang
121
0
0
15 Oct 2025
On the Role of Preference Variance in Preference Optimization
Jiacheng Guo
Zihao Li
Jiahao Qiu
Yue Wu
Mengdi Wang
148
2
0
14 Oct 2025
Answer-Consistent Chain-of-thought Reinforcement Learning For Multi-modal Large Langauge Models
Minbin Huang
Runhui Huang
Chuanyang Zheng
Jingyao Li
Guoxuan Chen
Han Shi
Hong Cheng
KELM
LRM
122
0
0
11 Oct 2025
Token-Level Policy Optimization: Linking Group-Level Rewards to Token-Level Aggregation via Markov Likelihood
Xingyu Lin
Yilin Wen
E. Wang
Du Su
Wenbin Liu
Chenfu Bao
Zhonghou Lv
88
1
0
10 Oct 2025
Hierarchical Scheduling for Multi-Vector Image Retrieval
Maoliang Li
K. Li
Yaoyang Liu
Jiayu Chen
Zihao Zheng
Yinjun Wu
Xiang Chen
118
1
0
10 Oct 2025
Mix- and MoE-DPO: A Variational Inference Approach to Direct Preference Optimization
Jason Bohne
Pawel Polak
David S. Rosenberg
Brian Bloniarz
Gary Kazantsev
146
1
0
09 Oct 2025
Incremental Summarization for Customer Support via Progressive Note-Taking and Agent Feedback
Yisha Wu
Cen Mia Zhao
Yuanpei Cao
Xiaoqing Su
Yashar Mehdad
Mindy Ji
Claire Na Cheng
127
0
0
08 Oct 2025
Agent-in-the-Loop: A Data Flywheel for Continuous Improvement in LLM-based Customer Support
Cen Mia Zhao
Tiantian Zhang
Hanchen Su
Y. Zhang
Shaowei Su
...
Yu Elaine Liu
Wei Han
Jeremy Werner
Claire Na Cheng
Yashar Mehdad
116
0
0
08 Oct 2025
Harnessing Synthetic Preference Data for Enhancing Temporal Understanding of Video-LLMs
Sameep Vani
Shreyas Jena
Maitreya Patel
Chitta Baral
Somak Aditya
Yezhou Yang
AI4TS
SyDa
148
0
0
04 Oct 2025
Improving Sampling Efficiency in RLVR through Adaptive Rollout and Response Reuse
Yuheng Zhang
Wenlin Yao
Changlong Yu
Yao Liu
Qingyu Yin
Bing Yin
Hyokun Yun
Lihong Li
141
1
0
30 Sep 2025
Alignment-Aware Decoding
Frédéric Berdoz
Luca A. Lanzendörfer
René Caky
Roger Wattenhofer
156
0
0
30 Sep 2025
ORPO-Distill: Mixed-Policy Preference Optimization for Cross-Architecture LLM Distillation
Aasheesh Singh
Vishal Vaddina
Dagnachew Birru
116
0
0
29 Sep 2025
Clean First, Align Later: Benchmarking Preference Data Cleaning for Reliable LLM Alignment
Min-Hsuan Yeh
Yixuan Li
191
1
0
28 Sep 2025
Toward Preference-aligned Large Language Models via Residual-based Model Steering
Lucio La Cava
Andrea Tagarelli
LLMSV
159
0
0
28 Sep 2025
Aligning LLMs for Multilingual Consistency in Enterprise Applications
Amit Agarwal
Hansa Meghwani
Hitesh Laxmichand Patel
Tao Sheng
Sujith Ravi
Dan Roth
256
5
0
28 Sep 2025
Diagnose, Localize, Align: A Full-Stack Framework for Reliable LLM Multi-Agent Systems under Instruction Conflicts
Guancheng Wan
Leixin Sun
Longxu Dou
Zitong Shi
Fang Wu
...
Hejia Geng
Xiangru Tang
Z. Yin
Yizhou Sun
Wei Wang
156
1
0
27 Sep 2025
Multiplayer Nash Preference Optimization
Fang Wu
X. Y. Huang
Weihao Xuan
Zhiwei Zhang
Yijia Xiao
...
Xiaomin Li
Bing Hu
Peng Xia
Jure Leskovec
Yejin Choi
137
2
0
27 Sep 2025
Failure Modes of Maximum Entropy RLHF
Ömer Veysel Çağatan
Barış Akgün
115
0
0
24 Sep 2025
TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs
Yunheng Li
Jing Cheng
Shaoyong Jia
Hangyi Kuang
Shaohui Jiao
Qibin Hou
Ming-Ming Cheng
AI4TS
VLM
232
6
0
22 Sep 2025
Captioning for Text-Video Retrieval via Dual-Group Direct Preference Optimization
Ji Soo Lee
Byungoh Ko
Jaewon Cho
Howoong Lee
Jaewoon Byun
Hyunwoo J. Kim
196
1
0
20 Sep 2025
GPO: Learning from Critical Steps to Improve LLM Reasoning
Jiahao Yu
Zelei Cheng
Xian Wu
Xinyu Xing
LRM
184
2
0
19 Sep 2025
Rethinking the Evaluation of Alignment Methods: Insights into Diversity, Generalisation, and Safety
Denis Janiak
Julia Moska
Dawid Motyka
Karolina Seweryn
Paweł Walkowiak
Bartosz Żuk
Arkadiusz Janz
ALM
165
0
0
16 Sep 2025
Building Coding Agents via Entropy-Enhanced Multi-Turn Preference Optimization
Jiahao Yu
Zelei Cheng
Xian Wu
Xinyu Xing
166
1
0
15 Sep 2025
When Safe Unimodal Inputs Collide: Optimizing Reasoning Chains for Cross-Modal Safety in Multimodal Large Language Models
Wei Cai
Shujuan Liu
Jian Zhao
Ziyan Shi
Yusheng Zhao
Yuchen Yuan
Tianle Zhang
Chi Zhang
Xuelong Li
LRM
193
2
0
15 Sep 2025
DetectAnyLLM: Towards Generalizable and Robust Detection of Machine-Generated Text Across Domains and Models
Jiachen Fu
Chun-Le Guo
Chongyi Li
MLLM
DeLMO
209
2
0
15 Sep 2025
Opal: An Operator Algebra View of RLHF
Madhava Gaikwad
114
0
0
14 Sep 2025
SearchInstruct: Enhancing Domain Adaptation via Retrieval-Based Instruction Dataset Creation
Iman Barati
Mostafa Amiri
Heshaam Faili
ALM
133
0
0
12 Sep 2025
CM-Align: Consistency-based Multilingual Alignment for Large Language Models
Xue Zhang
Yunlong Liang
Fandong Meng
Songming Zhang
Yufeng Chen
Jinan Xu
Jie Zhou
182
0
0
10 Sep 2025
The Thinking Therapist: Training Large Language Models to Deliver Acceptance and Commitment Therapy using Supervised Fine-Tuning and Odds Ratio Policy Optimization
Talha Tahir
OffRL
LRM
105
1
0
08 Sep 2025
Let's Roleplay: Examining LLM Alignment in Collaborative Dialogues
Abhijnan Nath
Carine Graff
Nikhil Krishnaswamy
LLMAG
150
3
0
07 Sep 2025
Adaptive Preference Optimization with Uncertainty-aware Utility Anchor
Xiaobo Wang
Zixia Jia
Jiaqi Li
Qi Liu
Zilong Zheng
104
0
0
03 Sep 2025
Learning to Generate Unit Test via Adversarial Reinforcement Learning
Dongjun Lee
Changho Hwang
Kimin Lee
135
1
0
28 Aug 2025
1
2
3
4
5
6
Next