Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.19605
Cited By
The Crucial Role of Samplers in Online Direct Preference Optimization
29 September 2024
Ruizhe Shi
Runlong Zhou
Simon S. Du
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Crucial Role of Samplers in Online Direct Preference Optimization"
4 / 4 papers shown
Title
SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning
Tianjian Li
Daniel Khashabi
50
0
0
05 May 2025
Ordering-based Conditions for Global Convergence of Policy Gradient Methods
Jincheng Mei
Bo Dai
Alekh Agarwal
Mohammad Ghavamzadeh
Csaba Szepesvári
Dale Schuurmans
50
4
0
02 Apr 2025
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
Jiawei Huang
Bingcong Li
Christoph Dann
Niao He
OffRL
61
0
0
26 Feb 2025
Larger or Smaller Reward Margins to Select Preferences for Alignment?
Kexin Huang
Junkang Wu
Ziqian Chen
Xue Wang
Jinyang Gao
Bolin Ding
Jiancan Wu
Xiangnan He
X. Wang
35
0
0
25 Feb 2025
1