Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.19443
Cited By
Mixed Preference Optimization: Reinforcement Learning with Data Selection and Better Reference Model
28 March 2024
Qi Gou
Cam-Tu Nguyen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Mixed Preference Optimization: Reinforcement Learning with Data Selection and Better Reference Model"
7 / 7 papers shown
Title
Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning
Chris
Yichen Wei
Yi Peng
X. Wang
Weijie Qiu
...
Jianhao Zhang
Y. Hao
Xuchen Song
Yang Liu
Yahui Zhou
OffRL
AI4TS
SyDa
LRM
VLM
74
0
0
23 Apr 2025
A Comprehensive Survey of Reward Models: Taxonomy, Applications, Challenges, and Future
Jialun Zhong
Wei Shen
Yanzeng Li
Songyang Gao
Hua Lu
Yicheng Chen
Yang Zhang
Wei Zhou
Jinjie Gu
Lei Zou
LRM
38
2
0
12 Apr 2025
Modality-Fair Preference Optimization for Trustworthy MLLM Alignment
Songtao Jiang
Yan Zhang
Ruizhe Chen
Yeying Jin
Zuozhu Liu
MLLM
MoE
19
6
0
20 Oct 2024
Prompt Optimization with Human Feedback
Xiaoqiang Lin
Zhongxiang Dai
Arun Verma
See-Kiong Ng
P. Jaillet
K. H. Low
AAML
29
8
0
27 May 2024
Filtered Direct Preference Optimization
Tetsuro Morimura
Mitsuki Sakamoto
Yuu Jinnai
Kenshi Abe
Kaito Air
35
13
0
22 Apr 2024
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,730
0
04 Mar 2022
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
275
1,561
0
18 Sep 2019
1