Papers
Communities
Organizations
Events
Blog
Pricing
Feedback
Contact Sales
Search
Open menu
Home
Papers
All Papers
Title
Home
Papers
2505.20686
Cited By
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
27 May 2025
Kianté Brantley
Mingyu Chen
Zhaolin Gao
Jason D. Lee
Wen Sun
Wenhao Zhan
Xuezhou Zhang
OffRL
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (2 upvotes)
Papers citing
"Accelerating RL for LLM Reasoning with Optimal Advantage Regression"
16 / 16 papers shown
Title
Single-stream Policy Optimization
Zhongwen Xu
Zihan Ding
OffRL
68
1
0
16 Sep 2025
Understanding Reinforcement Learning for Model Training, and future directions with GRAPE
Rohit Patel
OffRL
8
0
0
02 Sep 2025
Decomposing the Entropy-Performance Exchange: The Missing Keys to Unlocking Effective Reinforcement Learning
Jia Deng
Jie Chen
Zhipeng Chen
Wayne Xin Zhao
Ji-Rong Wen
LRM
33
2
0
04 Aug 2025
SPECS
\texttt{SPECS}
SPECS
: Faster Test-Time Scaling through Speculative Drafts
Mert Cemri
Nived Rajaraman
Rishabh Tiwari
Xiaoxuan Liu
Kurt Keutzer
Ion Stoica
Kannan Ramchandran
Ahmad Beirami
Ziteng Sun
LRM
111
1
0
15 Jun 2025
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Yang Yue
Zhiqi Chen
Rui Lu
Andrew Zhao
Zhaokai Wang
Yang Yue
Shiji Song
Gao Huang
ReLM
LRM
353
266
0
18 Apr 2025
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
Wei Xiong
Jiarui Yao
Yuhui Xu
Bo Pang
Lei Wang
...
Junnan Li
Nan Jiang
Tong Zhang
Caiming Xiong
Hanze Dong
OffRL
LRM
208
58
0
15 Apr 2025
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
Rosie Zhao
Alexandru Meterez
Sham Kakade
Cengiz Pehlevan
Samy Jelassi
Eran Malach
ReLM
LRM
512
45
0
10 Apr 2025
Understanding R1-Zero-Like Training: A Critical Perspective
Zichen Liu
Changyu Chen
Wenjun Li
Penghui Qi
Tianyu Pang
Chao Du
Wee Sun Lee
Jialin Li
OffRL
LRM
311
325
0
26 Mar 2025
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild
Weihao Zeng
Yuzhen Huang
Qian Liu
Wei Liu
Keqing He
Zejun Ma
Junxian He
OffRL
ReLM
LRM
349
229
0
24 Mar 2025
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Qiying Yu
Zheng Zhang
Ruofei Zhu
Yufeng Yuan
Xiaochen Zuo
...
Ya Zhang
Lin Yan
Mu Qiao
Yonghui Wu
Mingxuan Wang
OffRL
LRM
368
499
0
18 Mar 2025
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs
Kanishk Gandhi
Ayush Chakravarthy
Anikait Singh
Nathan Lile
Noah D. Goodman
ReLM
LRM
289
185
0
03 Mar 2025
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Shicong Cen
Jincheng Mei
Katayoon Goshvadi
Hanjun Dai
Tong Yang
Sherry Yang
Dale Schuurmans
Yuejie Chi
Bo Dai
OffRL
228
44
0
20 Feb 2025
Analysis of Diffusion Models for Manifold Data
Anand Jerry George
Rodrigo Veiga
Nicolas Macris
DiffM
133
3
0
01 Feb 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
582
3,403
0
22 Jan 2025
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization
Guanlin Liu
Kaixuan Ji
Ning Dai
Zheng Wu
Chen Dun
Q. Gu
Lin Yan
Quanquan Gu
Lin Yan
OffRL
LRM
192
14
0
11 Oct 2024
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Zhaolin Gao
Wenhao Zhan
Jonathan D. Chang
Gokul Swamy
Kianté Brantley
Jason D. Lee
Wen Sun
OffRL
215
8
0
06 Oct 2024
1