ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.07691
  4. Cited By
ORPO: Monolithic Preference Optimization without Reference Model
v1v2 (latest)

ORPO: Monolithic Preference Optimization without Reference Model

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
12 March 2024
Jiwoo Hong
Noah Lee
James Thorne
    OSLM
ArXiv (abs)PDFHTMLHuggingFace (67 upvotes)

Papers citing "ORPO: Monolithic Preference Optimization without Reference Model"

50 / 252 papers shown
Title
Mitigating Hallucinations in Multimodal LLMs via Object-aware Preference Optimization
Mitigating Hallucinations in Multimodal LLMs via Object-aware Preference Optimization
Alberto Compagnoni
Davide Caffagni
Nicholas Moratelli
Lorenzo Baraldi
Marcella Cornia
Rita Cucchiara
MLLM
173
1
0
27 Aug 2025
HAEPO: History-Aggregated Exploratory Policy Optimization
HAEPO: History-Aggregated Exploratory Policy Optimization
Gaurish Trivedi
Alakh Sharma
Kartikey Singh Bhandari
Dhruv Kumar
Pratik Narang
Jagat Sesh Challa
68
0
0
26 Aug 2025
Weights-Rotated Preference Optimization for Large Language Models
Weights-Rotated Preference Optimization for Large Language Models
Chenxu Yang
Ruipeng Jia
Mingyu Zheng
Naibin Gu
Zheng Lin
Siyuan Chen
Weichong Yin
Hua Wu
Weiping Wang
105
0
0
25 Aug 2025
What Matters in Data for DPO?
What Matters in Data for DPO?
Yu Pan
Zhongze Cai
Guanting Chen
Huaiyang Zhong
Chonghuan Wang
236
3
0
23 Aug 2025
Fusing Rewards and Preferences in Reinforcement Learning
Fusing Rewards and Preferences in Reinforcement Learning
Sadegh Khorasani
Saber Salehkaleybar
Negar Kiyavash
Matthias Grossglauser
124
1
0
15 Aug 2025
Pruning Long Chain-of-Thought of Large Reasoning Models via Small-Scale Preference Optimization
Pruning Long Chain-of-Thought of Large Reasoning Models via Small-Scale Preference Optimization
Bin Hong
Jiayu Liu
Zhenya Huang
Kai Zhang
Mengdi Zhang
LRM
167
0
0
13 Aug 2025
ParallelSearch: Train your LLMs to Decompose Query and Search Sub-queries in Parallel with Reinforcement Learning
ParallelSearch: Train your LLMs to Decompose Query and Search Sub-queries in Parallel with Reinforcement Learning
Shu Zhao
Tan Yu
Anbang Xu
Japinder Singh
Aaditya Shukla
Rama Akkiraju
ReLMAI4TSLRM
141
14
0
12 Aug 2025
Forgetting: A New Mechanism Towards Better Large Language Model Fine-tuning
Forgetting: A New Mechanism Towards Better Large Language Model Fine-tuning
Ali Taheri Ghahrizjani
Alireza Taban
Qizhou Wang
Shanshan Ye
Tongliang Liu
Tongliang Liu
CLLMU
224
1
0
06 Aug 2025
V.I.P. : Iterative Online Preference Distillation for Efficient Video Diffusion Models
V.I.P. : Iterative Online Preference Distillation for Efficient Video Diffusion Models
Jisoo Kim
Wooseok Seo
Junwan Kim
Seungho Park
Sooyeon Park
Youngjae Yu
VGen
167
1
0
05 Aug 2025
Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models
Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models
Zizhuo Zhang
Jianing Zhu
Xinmu Ge
Zihua Zhao
Zhanke Zhou
Xuan Li
Xiao Feng
Jiangchao Yao
Bo Han
ALMLRM
225
0
0
01 Aug 2025
UAV-ON: A Benchmark for Open-World Object Goal Navigation with Aerial Agents
UAV-ON: A Benchmark for Open-World Object Goal Navigation with Aerial Agents
Jianqiang Xiao
Yuexuan Sun
Yixin Shao
Boxi Gan
Rongqiang Liu
Yanjing Wu
Weili Gua
Xiang Deng
232
8
0
01 Aug 2025
MoHoBench: Assessing Honesty of Multimodal Large Language Models via Unanswerable Visual Questions
MoHoBench: Assessing Honesty of Multimodal Large Language Models via Unanswerable Visual Questions
Yanxu Zhu
Shitong Duan
Xiangxu Zhang
Jitao Sang
Peng Zhang
Tun Lu
Xiao Zhou
Jing Yao
Xiaoyuan Yi
Xing Xie
138
0
0
29 Jul 2025
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge
Guangchen Lan
Sipeng Zhang
Tianle Wang
Yuwei Zhang
Daoan Zhang
Xinpeng Wei
Xiaoman Pan
Hongming Zhang
Dong-Jun Han
Christopher G. Brinton
266
2
0
27 Jul 2025
SGPO: Self-Generated Preference Optimization based on Self-Improver
SGPO: Self-Generated Preference Optimization based on Self-Improver
Hyeonji Lee
DaeJin Jo
Seohwan Yun
Sungwoong Kim
SyDa
144
0
0
27 Jul 2025
Mitigating Geospatial Knowledge Hallucination in Large Language Models: Benchmarking and Dynamic Factuality Aligning
Mitigating Geospatial Knowledge Hallucination in Large Language Models: Benchmarking and Dynamic Factuality Aligning
Shengyuan Wang
J. Feng
Tianhui Liu
Dan Pei
Yong Li
HILM
125
0
0
25 Jul 2025
Reinforcement Learning Fine-Tunes a Sparse Subnetwork in Large Language Models
Reinforcement Learning Fine-Tunes a Sparse Subnetwork in Large Language Models
Andrii Balashov
192
0
0
23 Jul 2025
Chinchunmei at SemEval-2025 Task 11: Boosting the Large Language Model's Capability of Emotion Perception using Contrastive Learning
Chinchunmei at SemEval-2025 Task 11: Boosting the Large Language Model's Capability of Emotion Perception using Contrastive Learning
Tian Li
Yujian Sun
Huizhi Liang
82
1
0
21 Jul 2025
Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions
Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions
Simon Matrenok
Skander Moalla
Çağlar Gülçehre
50
0
0
10 Jul 2025
Principled Foundations for Preference Optimization
Principled Foundations for Preference Optimization
Wenxuan Zhou
Shujian Zhang
Brice Magdalou
John Lambert
Ehsan Amid
Richard Nock
Andrew Straiton Hard
262
0
0
10 Jul 2025
ESSA: Evolutionary Strategies for Scalable Alignment
ESSA: Evolutionary Strategies for Scalable Alignment
Daria Korotyshova
Boris Shaposhnikov
Alexey Malakhov
Alexey Khokhulin
Nikita Surnachev
Kirill Ovcharenko
George Bredis
Alexey Gorbatovski
Viacheslav Sinii
Daniil Gavrilov
168
0
0
06 Jul 2025
The Hidden Link Between RLHF and Contrastive Learning
The Hidden Link Between RLHF and Contrastive Learning
Xufei Lv
Kehai Chen
Haoyuan Sun
X. Bai
Min Zhang
Houde Liu
Kehai Chen
190
2
0
27 Jun 2025
Using cognitive models to reveal value trade-offs in language models
Using cognitive models to reveal value trade-offs in language models
Sonia K. Murthy
Rosie Zhao
Jennifer Hu
Sham Kakade
Markus Wulfmeier
Peng Qian
Tomer Ullman
253
1
0
25 Jun 2025
SPARE: Single-Pass Annotation with Reference-Guided Evaluation for Automatic Process Supervision and Reward Modelling
SPARE: Single-Pass Annotation with Reference-Guided Evaluation for Automatic Process Supervision and Reward Modelling
Md Imbesat Hassan Rizvi
Xiaodan Zhu
Iryna Gurevych
LRM
198
2
0
18 Jun 2025
Rethinking DPO: The Role of Rejected Responses in Preference Misalignment
Rethinking DPO: The Role of Rejected Responses in Preference Misalignment
Jay Hyeon Cho
JunHyeok Oh
Myunsoo Kim
Byung-Jun Lee
202
3
0
15 Jun 2025
Preference-Driven Multi-Objective Combinatorial Optimization with Conditional Computation
Mingfeng Fan
Jianan Zhou
Yifeng Zhang
Yaoxin Wu
Jinbiao Chen
Guillaume Sartoretti
AI4CE
245
0
0
10 Jun 2025
Reinforce LLM Reasoning through Multi-Agent Reflection
Yurun Yuan
Tengyang Xie
LRM
277
16
0
10 Jun 2025
Explicit Preference Optimization: No Need for an Implicit Reward Model
Explicit Preference Optimization: No Need for an Implicit Reward Model
Xiangkun Hu
Lemin Kong
Tong He
David Wipf
186
0
0
09 Jun 2025
Adaptive Batch-Wise Sample Scheduling for Direct Preference Optimization
Adaptive Batch-Wise Sample Scheduling for Direct Preference Optimization
Zixuan Huang
Yikun Ban
Lean Fu
Xiaojie Li
Zhongxiang Dai
Jianxin Li
Deqing Wang
295
1
0
08 Jun 2025
Debiasing Online Preference Learning via Preference Feature Preservation
Debiasing Online Preference Learning via Preference Feature PreservationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Dongyoung Kim
Jinsung Yoon
Jinwoo Shin
Jaehyung Kim
190
0
0
06 Jun 2025
APT: Improving Specialist LLM Performance with Weakness Case Acquisition and Iterative Preference Training
APT: Improving Specialist LLM Performance with Weakness Case Acquisition and Iterative Preference TrainingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Jun Rao
Zepeng Lin
Xuebo Liu
Xiaopeng Ke
Lian Lian
Dong Jin
Shengjun Cheng
Jun Yu
Min Zhang
213
7
0
04 Jun 2025
Robust Preference Optimization via Dynamic Target Margins
Robust Preference Optimization via Dynamic Target MarginsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Jie Sun
Junkang Wu
Jiancan Wu
Zhibo Zhu
Xingyu Lu
Jun Zhou
Lintao Ma
Xiang Wang
242
4
0
04 Jun 2025
Doubly Robust Alignment for Large Language Models
Doubly Robust Alignment for Large Language Models
Erhan Xu
Kai Ye
Hongyi Zhou
Luhan Zhu
Francesco Quinzan
Chengchun Shi
284
3
0
01 Jun 2025
MDPO: Multi-Granularity Direct Preference Optimization for Mathematical Reasoning
MDPO: Multi-Granularity Direct Preference Optimization for Mathematical Reasoning
Yunze Lin
LRM
139
0
0
30 May 2025
Discriminative Policy Optimization for Token-Level Reward Models
Discriminative Policy Optimization for Token-Level Reward Models
Hongzhan Chen
Tao Yang
Shiping Gao
Ruijun Chen
Xiaojun Quan
Hongtao Tian
Ting Yao
165
3
0
29 May 2025
Learning Parametric Distributions from Samples and Preferences
Learning Parametric Distributions from Samples and Preferences
Marc Jourdan
Gizem Yüce
Nicolas Flammarion
179
0
0
29 May 2025
Reinforcement Learning for Better Verbalized Confidence in Long-Form Generation
Reinforcement Learning for Better Verbalized Confidence in Long-Form Generation
Caiqi Zhang
Xiaochen Zhu
Chengzu Li
Nigel Collier
Andreas Vlachos
OffRLHILM
239
7
0
29 May 2025
Probability-Consistent Preference Optimization for Enhanced LLM Reasoning
Probability-Consistent Preference Optimization for Enhanced LLM ReasoningAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yunqiao Yang
Houxing Ren
Zimu Lu
Ke Wang
Weikang Shi
A-Long Zhou
Junting Pan
Mingjie Zhan
Hongsheng Li
LRM
167
0
0
29 May 2025
Modeling and Optimizing User Preferences in AI Copilots: A Comprehensive Survey and Taxonomy
Modeling and Optimizing User Preferences in AI Copilots: A Comprehensive Survey and Taxonomy
Saleh Afzoon
Zahra Jahanandish
Phuong Thao Huynh
Amin Beheshti
Usman Naseem
287
0
0
28 May 2025
SDPO: Importance-Sampled Direct Preference Optimization for Stable Diffusion Training
SDPO: Importance-Sampled Direct Preference Optimization for Stable Diffusion Training
Xiaomeng Yang
Zhiyu Tan
Junyan Wang
Zhijian Zhou
Hao Li
255
0
0
28 May 2025
Token-Importance Guided Direct Preference Optimization
Token-Importance Guided Direct Preference Optimization
Yang Ning
Lin Hai
Liu Yibo
Tian Baoliang
Liu Guoqing
Zhang Haijun
227
0
0
26 May 2025
Frictional Agent Alignment Framework: Slow Down and Don't Break Things
Frictional Agent Alignment Framework: Slow Down and Don't Break ThingsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Abhijnan Nath
Carine Graff
Andrei Bachinin
Nikhil Krishnaswamy
289
4
0
26 May 2025
Controlling Language Confusion in Multilingual LLMs
Controlling Language Confusion in Multilingual LLMs
Nahyun Lee
Yeongseo Woo
Hyunwoo Ko
Guijin Son
335
2
0
25 May 2025
Rethinking Direct Preference Optimization in Diffusion Models
Rethinking Direct Preference Optimization in Diffusion Models
Junyong Kang
Seohyun Lim
Kyungjune Baek
Hyunjung Shim
947
0
0
24 May 2025
Bridging Supervised Learning and Reinforcement Learning in Math Reasoning
Bridging Supervised Learning and Reinforcement Learning in Math Reasoning
Huayu Chen
Kaiwen Zheng
Qinsheng Zhang
Ganqu Cui
Yin Cui
Haotian Ye
Tsung-Yi Lin
Ming-Yu Liu
Jun Zhu
Haoxiang Wang
OffRLLRM
465
14
0
23 May 2025
MPO: Multilingual Safety Alignment via Reward Gap Optimization
MPO: Multilingual Safety Alignment via Reward Gap OptimizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Weixiang Zhao
Yulin Hu
Yang Deng
Tongtong Wu
Wenxuan Zhang
...
An Zhang
Yanyan Zhao
Bing Qin
Tat-Seng Chua
Ting Liu
276
7
0
22 May 2025
Reward Is Enough: LLMs Are In-Context Reinforcement Learners
Reward Is Enough: LLMs Are In-Context Reinforcement Learners
Kefan Song
Amir Moeini
Peng Wang
Lei Gong
Rohan Chandra
Yanjun Qi
Shangtong Zhang
ReLMLRM
311
16
0
21 May 2025
Revealing Language Model Trajectories via Kullback-Leibler Divergence
Revealing Language Model Trajectories via Kullback-Leibler Divergence
Ryo Kishino
Yusuke Takase
Momose Oyama
Hiroaki Yamagiwa
Hidetoshi Shimodaira
256
0
0
21 May 2025
Cross-Lingual Optimization for Language Transfer in Large Language Models
Cross-Lingual Optimization for Language Transfer in Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Jungseob Lee
Seongtae Hong
Hyeonseok Moon
Heuiseok Lim
207
1
0
20 May 2025
SGDPO: Self-Guided Direct Preference Optimization for Language Model Alignment
SGDPO: Self-Guided Direct Preference Optimization for Language Model AlignmentAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Wenqiao Zhu
Ji Liu
Lulu Wang
Jun Wu
Yulun Zhang
350
2
0
18 May 2025
SoLoPO: Unlocking Long-Context Capabilities in LLMs via Short-to-Long Preference Optimization
SoLoPO: Unlocking Long-Context Capabilities in LLMs via Short-to-Long Preference Optimization
Huashan Sun
Shengyi Liao
Yansen Han
Yu Bai
Yang Gao
...
Weizhou Shen
Fanqi Wan
Ming Yan
J.N. Zhang
Fei Huang
547
1
0
16 May 2025
Previous
123456
Next