ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
  • Feedback
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.20686
  4. Cited By
Accelerating RL for LLM Reasoning with Optimal Advantage Regression

Accelerating RL for LLM Reasoning with Optimal Advantage Regression

27 May 2025
Kianté Brantley
Mingyu Chen
Zhaolin Gao
Jason D. Lee
Wen Sun
Wenhao Zhan
Xuezhou Zhang
    OffRLLRM
ArXiv (abs)PDFHTMLHuggingFace (2 upvotes)

Papers citing "Accelerating RL for LLM Reasoning with Optimal Advantage Regression"

16 / 16 papers shown
Title
Single-stream Policy Optimization
Single-stream Policy Optimization
Zhongwen Xu
Zihan Ding
OffRL
68
1
0
16 Sep 2025
Understanding Reinforcement Learning for Model Training, and future directions with GRAPE
Understanding Reinforcement Learning for Model Training, and future directions with GRAPE
Rohit Patel
OffRL
4
0
0
02 Sep 2025
Decomposing the Entropy-Performance Exchange: The Missing Keys to Unlocking Effective Reinforcement Learning
Decomposing the Entropy-Performance Exchange: The Missing Keys to Unlocking Effective Reinforcement Learning
Jia Deng
Jie Chen
Zhipeng Chen
Wayne Xin Zhao
Ji-Rong Wen
LRM
33
2
0
04 Aug 2025
$\texttt{SPECS}$: Faster Test-Time Scaling through Speculative Drafts
SPECS\texttt{SPECS}SPECS: Faster Test-Time Scaling through Speculative Drafts
Mert Cemri
Nived Rajaraman
Rishabh Tiwari
Xiaoxuan Liu
Kurt Keutzer
Ion Stoica
Kannan Ramchandran
Ahmad Beirami
Ziteng Sun
LRM
107
1
0
15 Jun 2025
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Yang Yue
Zhiqi Chen
Rui Lu
Andrew Zhao
Zhaokai Wang
Yang Yue
Shiji Song
Gao Huang
ReLMLRM
353
266
0
18 Apr 2025
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
Wei Xiong
Jiarui Yao
Yuhui Xu
Bo Pang
Lei Wang
...
Junnan Li
Nan Jiang
Tong Zhang
Caiming Xiong
Hanze Dong
OffRLLRM
208
58
0
15 Apr 2025
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
Rosie Zhao
Alexandru Meterez
Sham Kakade
Cengiz Pehlevan
Samy Jelassi
Eran Malach
ReLMLRM
512
45
0
10 Apr 2025
Understanding R1-Zero-Like Training: A Critical Perspective
Understanding R1-Zero-Like Training: A Critical Perspective
Zichen Liu
Changyu Chen
Wenjun Li
Penghui Qi
Tianyu Pang
Chao Du
Wee Sun Lee
Jialin Li
OffRLLRM
311
325
0
26 Mar 2025
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild
Weihao Zeng
Yuzhen Huang
Qian Liu
Wei Liu
Keqing He
Zejun Ma
Junxian He
OffRLReLMLRM
349
229
0
24 Mar 2025
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Qiying Yu
Zheng Zhang
Ruofei Zhu
Yufeng Yuan
Xiaochen Zuo
...
Ya Zhang
Lin Yan
Mu Qiao
Yonghui Wu
Mingxuan Wang
OffRLLRM
368
499
0
18 Mar 2025
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs
Kanishk Gandhi
Ayush Chakravarthy
Anikait Singh
Nathan Lile
Noah D. Goodman
ReLMLRM
289
185
0
03 Mar 2025
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF
Shicong Cen
Jincheng Mei
Katayoon Goshvadi
Hanjun Dai
Tong Yang
Sherry Yang
Dale Schuurmans
Yuejie Chi
Bo Dai
OffRL
228
44
0
20 Feb 2025
Analysis of Diffusion Models for Manifold Data
Analysis of Diffusion Models for Manifold Data
Anand Jerry George
Rodrigo Veiga
Nicolas Macris
DiffM
129
3
0
01 Feb 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLMVLMOffRLAI4TSLRM
582
3,403
0
22 Jan 2025
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization
Guanlin Liu
Kaixuan Ji
Ning Dai
Zheng Wu
Chen Dun
Q. Gu
Lin Yan
Quanquan Gu
Lin Yan
OffRLLRM
192
14
0
11 Oct 2024
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Zhaolin Gao
Wenhao Zhan
Jonathan D. Chang
Gokul Swamy
Kianté Brantley
Jason D. Lee
Wen Sun
OffRL
215
8
0
06 Oct 2024
1