Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2502.01456
Cited By
v1
v2 (latest)
Process Reinforcement through Implicit Rewards
3 February 2025
Ganqu Cui
Lifan Yuan
Liang Luo
Hanbin Wang
Wendi Li
Bingxiang He
Wendi Li
Tianyu Yu
Qixin Xu
Weize Chen
Qixin Xu
Huayu Chen
Kaiyan Zhang
Xingtai Lv
Kaiyan Zhang
Xingtai Lv
Xu Han
Yuan Yao
Yu Cheng
Zhiyuan Liu
Maosong Sun
Zhiyuan Liu
Ning Ding
Bowen Zhou
Ning Ding
OffRL
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (62 upvotes)
Papers citing
"Process Reinforcement through Implicit Rewards"
50 / 161 papers shown
Title
BudgetThinker: Empowering Budget-aware LLM Reasoning with Control Tokens
Hao Wen
Xinrui Wu
Yi Sun
Feifei Zhang
Liye Chen
Jie Wang
Yunxin Liu
Yunhao Liu
Y. Zhang
Yuanchun Li
LRM
125
5
0
24 Aug 2025
Your Reward Function for RL is Your Best PRM for Search: Unifying RL and Search-Based TTS
Can Jin
Yang Zhou
Qixin Zhang
Hongwu Peng
Di Zhang
Marco Pavone
Ligong Han
Zhang-Wei Hong
Tong Che
Dimitris N. Metaxas
OffRL
LRM
184
3
0
19 Aug 2025
G
2
^2
2
RPO-A: Guided Group Relative Policy Optimization with Adaptive Guidance
Yongxin Guo
Wenbo Deng
Zhenglin Cheng
Xiaoying Tang
LRM
104
2
0
18 Aug 2025
Hard Examples Are All You Need: Maximizing GRPO Post-Training Under Annotation Budgets
Benjamin Pikus
Pratyush Ranjan Tiwari
Burton Ye
208
4
0
15 Aug 2025
SSRL: Self-Search Reinforcement Learning
Wendi Li
Kaiyan Zhang
Heng Zhou
Yuxin Zuo
Yanxu Chen
...
Z. He
Bingning Wang
Wenlong Zhang
Ning Ding
Bowen Zhou
LLMAG
LRM
83
4
0
14 Aug 2025
Pruning Long Chain-of-Thought of Large Reasoning Models via Small-Scale Preference Optimization
Bin Hong
Jiayu Liu
Zhenya Huang
Kai Zhang
Mengdi Zhang
LRM
126
0
0
13 Aug 2025
EvoCoT: Overcoming the Exploration Bottleneck in Reinforcement Learning
Huanyu Liu
Jia Li
Chang Yu
Taozhi Chen
Yihong Dong
Lecheng Wang
Hu XiaoLong
Ge Li
OffRL
LRM
244
2
0
11 Aug 2025
AMFT: Aligning LLM Reasoners by Meta-Learning the Optimal Imitation-Exploration Balance
Lixuan He
Jie Feng
Yong Li
OffRL
LRM
142
2
0
09 Aug 2025
Sample-efficient LLM Optimization with Reset Replay
Zichuan Liu
Jinyu Wang
Lei Song
Jiang Bian
OffRL
123
0
0
08 Aug 2025
Posterior-GRPO: Rewarding Reasoning Processes in Code Generation
Lishui Fan
Yu Zhang
Mouxiang Chen
Zhongxin Liu
OffRL
LRM
76
10
0
07 Aug 2025
TempFlow-GRPO: When Timing Matters for GRPO in Flow Models
Xiaoxuan He
Siming Fu
Yuke Zhao
W. Li
Zhiqiang Wang
Dacheng Yin
Fengyun Rao
Bo Zhang
AI4CE
192
14
0
06 Aug 2025
Sotopia-RL: Reward Design for Social Intelligence
Haofei Yu
Zhengyang Qi
Yining Zhao
Kolby Nottingham
Keyang Xuan
Bodhisattwa Prasad Majumder
Hao Zhu
Paul Pu Liang
Jiaxuan You
OffRL
164
4
0
05 Aug 2025
Self-Questioning Language Models
Lili Chen
Mihir Prabhudesai
Katerina Fragkiadaki
Hao Liu
Deepak Pathak
ReLM
SyDa
LRM
334
13
0
05 Aug 2025
Enhancing Math Reasoning in Small-sized LLMs via Preview Difficulty-Aware Intervention
Xinhan Di
JoyJiaoW
LRM
91
2
0
03 Aug 2025
RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization
Yihong Dong
Xue Jiang
Yongding Tao
Huanyu Liu
Kechi Zhang
...
Binhua Li
Zhi Jin
Fei Huang
Y. Li
Ge Li
LRM
249
16
0
31 Jul 2025
VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning
Ruifeng Yuan
Chenghao Xiao
Sicong Leng
Jianyu Wang
Long Li
...
Deli Zhao
Qifeng Bai
Zhongyu Wei
H. Zhang
Yu Rong
OffRL
ReLM
LRM
194
9
0
30 Jul 2025
EDGE-GRPO: Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity
Xingjian Zhang
Siwei Wen
Wenjun Wu
Lei Huang
92
10
0
29 Jul 2025
Geometric-Mean Policy Optimization
Yuzhong Zhao
Yue Liu
Junpeng Liu
Jingye Chen
Xun Wu
...
Shaohan Huang
Lei Cui
Qixiang Ye
Fang Wan
Furu Wei
169
20
0
28 Jul 2025
PurpCode: Reasoning for Safer Code Generation
Jiawei Liu
Nirav Diwan
Zhe Wang
Haoyu Zhai
Xiaona Zhou
...
Hadjer Benkraouda
Yuxiang Wei
Lingming Zhang
Ismini Lourentzou
Gang Wang
SILM
LRM
ELM
350
5
0
25 Jul 2025
Predictive Scaling Laws for Efficient GRPO Training of Large Reasoning Models
Datta Nimmaturi
Vaishnavi Bhargava
Rajat Ghosh
Johnu George
Debojyoti Dutta
LRM
90
2
0
24 Jul 2025
Reinforcement Learning Fine-Tunes a Sparse Subnetwork in Large Language Models
Andrii Balashov
164
0
0
23 Jul 2025
Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains
Anisha Gunjal
Anthony Wang
Elaine Lau
Vaskar Nath
Bing Liu
Bing Liu
Sean Hendryx
OffRL
151
40
0
23 Jul 2025
Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR
Jiakang Wang
Runze Liu
Fuzheng Zhang
Xiu Li
Guorui Zhou
OffRL
121
15
0
21 Jul 2025
VAR-MATH: Probing True Mathematical Reasoning in LLMS via Symbolic Multi-Instance Benchmarks
Jian Yao
Ran Cheng
Kay Chen Tan
OffRL
LRM
80
1
0
17 Jul 2025
Discrete Diffusion Trajectory Alignment via Stepwise Decomposition
Jiaqi Han
Austin Wang
Minkai Xu
Wenda Chu
Meihua Dang
Yisong Yue
Stefano Ermon
115
4
0
07 Jul 2025
Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models?
Yun Qu
Qi Wang
Yixiu Mao
Vincent Tao Hu
Bjorn Ommer
Xiangyang Ji
OffRL
LRM
198
12
0
07 Jul 2025
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs
Jiaru Zou
Ling Yang
Jingwen Gu
Jiahao Qiu
Ke Shen
Jingrui He
M. Y. Wang
ReLM
LRM
141
15
0
23 Jun 2025
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs
Xumeng Wen
Zihan Liu
Shun Zheng
Shengyu Ye
Shengyu Ye
...
Yang Wang
Junjie Li
Ziming Miao
Jiang Bian
Mao Yang
LRM
277
51
0
17 Jun 2025
Reasoning with Exploration: An Entropy Perspective
Daixuan Cheng
Shaohan Huang
Xuekai Zhu
Bo Dai
Wayne Xin Zhao
Zhenliang Zhang
Furu Wei
LRM
237
109
0
17 Jun 2025
Personalized LLM Decoding via Contrasting Personal Preference
Hyungjune Bu
Chanjoo Jung
Minjae Kang
Jaehyung Kim
185
1
0
13 Jun 2025
Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models
Shuai Wang
Zhenhua Liu
Jiaheng Wei
Xuanwu Yin
Dong Li
E. Barsoum
LRM
234
10
0
11 Jun 2025
Intra-Trajectory Consistency for Reward Modeling
Chaoyang Zhou
Shunyu Liu
Zengmao Wang
Di Wang
Rong-Cheng Tu
Bo Du
Dacheng Tao
305
0
0
10 Jun 2025
ADAM: Autonomous Discovery and Annotation Model using LLMs for Context-Aware Annotations
Amirreza Rouhi
Solmaz Arezoomandan
Knut Peterson
Joseph T. Woods
David Han
VLM
143
11
0
10 Jun 2025
Learning What Reinforcement Learning Can't: Interleaved Online Fine-Tuning for Hardest Questions
Lu Ma
Hao Liang
Meiyi Qiang
Lexiang Tang
Xiaochen Ma
...
Chengyu Shen
Runming He
Bin Cui
Wentao Zhang
Wentao Zhang
ReLM
OffRL
LRM
172
37
0
09 Jun 2025
Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties
Gouki Minegishi
Hiroki Furuta
Takeshi Kojima
Yusuke Iwasawa
Y. Matsuo
LRM
938
9
0
06 Jun 2025
SPARTA ALIGNMENT: Collectively Aligning Multiple Language Models through Combat
Yuru Jiang
Wenxuan Ding
Shangbin Feng
Greg Durrett
Yulia Tsvetkov
229
2
0
05 Jun 2025
Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay
Yifan Sun
Jingyan Shen
Yibin Wang
Tianyu Chen
Zhendong Wang
Mingyuan Zhou
Huan Zhang
305
9
0
05 Jun 2025
FreePRM: Training Process Reward Models Without Ground Truth Process Labels
Lin Sun
C. Liu
Xiaofeng Ma
Tao Yang
Weijia Lu
Ning Wu
157
6
0
04 Jun 2025
Response-Level Rewards Are All You Need for Online Reinforcement Learning in LLMs: A Mathematical Perspective
Shenghua He
Tian Xia
Xuan Zhou
Hui Wei
OffRL
170
2
0
03 Jun 2025
Towards Effective Code-Integrated Reasoning
Fei Bai
Yingqian Min
Beichen Zhang
Zhipeng Chen
Wayne Xin Zhao
Lei Fang
Zheng Liu
Zhongyuan Wang
Ji-Rong Wen
OffRL
LRM
120
10
0
30 May 2025
Diversity-Aware Policy Optimization for Large Language Model Reasoning
Jian Yao
Ran Cheng
Xingyu Wu
Jibin Wu
Kay Chen Tan
LRM
219
14
0
29 May 2025
Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models
Yiran Guo
Lijie Xu
Jie Liu
Dan Ye
Delin Qu
OffRL
244
13
0
29 May 2025
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models
Ganqu Cui
Yuchen Zhang
Jiacheng Chen
Lifan Yuan
Zhi Wang
...
Lei Bai
Wanli Ouyang
Yu Cheng
Bowen Zhou
Ning Ding
LRM
186
189
0
28 May 2025
Reinforced Reasoning for Embodied Planning
Di Wu
Jiaxin Fan
Junzhe Zang
G. Wang
Wei Yin
Wenhao Li
Bo Jin
LRM
341
7
0
28 May 2025
Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning
Mingyang Song
Mao Zheng
OffRL
LRM
244
6
0
27 May 2025
The Avengers: A Simple Recipe for Uniting Smaller Language Models to Challenge Proprietary Giants
Yiqun Zhang
Hao Li
Chenxu Wang
L. Chen
Qiaosheng Zhang
...
Xinrun Wang
Jia Xu
Mengwei He
Xuming He
Shuyue Hu
339
13
0
26 May 2025
Interleaved Reasoning for Large Language Models via Reinforcement Learning
Roy Xie
David Qiu
Deepak Gopinath
Dong Lin
Yanchao Sun
Chong-Jun Wang
Saloni Potdar
Bhuwan Dhingra
KELM
LRM
220
5
0
26 May 2025
Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO
Ruizhe Shi
Minhak Song
Runlong Zhou
Zihan Zhang
Maryam Fazel
S. S. Du
240
5
0
26 May 2025
Token-Importance Guided Direct Preference Optimization
Yang Ning
Lin Hai
Liu Yibo
Tian Baoliang
Liu Guoqing
Zhang Haijun
183
0
0
26 May 2025
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Fanqi Wan
Weizhou Shen
Shengyi Liao
Yingcheng Shi
Chenliang Li
Ziyi Yang
Ji Zhang
Fei Huang
Jingren Zhou
Ming Yan
OffRL
LLMAG
ReLM
LRM
251
9
0
23 May 2025
Previous
1
2
3
4
Next