ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.01456
  4. Cited By
Process Reinforcement through Implicit Rewards
v1v2 (latest)

Process Reinforcement through Implicit Rewards

3 February 2025
Ganqu Cui
Lifan Yuan
Liang Luo
Hanbin Wang
Wendi Li
Bingxiang He
Wendi Li
Tianyu Yu
Qixin Xu
Weize Chen
Qixin Xu
Huayu Chen
Kaiyan Zhang
Xingtai Lv
Kaiyan Zhang
Xingtai Lv
Xu Han
Yuan Yao
Yu Cheng
Zhiyuan Liu
Maosong Sun
Zhiyuan Liu
Ning Ding
Bowen Zhou
Ning Ding
    OffRLLRM
ArXiv (abs)PDFHTMLHuggingFace (62 upvotes)

Papers citing "Process Reinforcement through Implicit Rewards"

50 / 161 papers shown
Title
Bridging Supervised Learning and Reinforcement Learning in Math Reasoning
Bridging Supervised Learning and Reinforcement Learning in Math Reasoning
Huayu Chen
Kaiwen Zheng
Qinsheng Zhang
Ganqu Cui
Yin Cui
Haotian Ye
Tsung-Yi Lin
Ming-Yu Liu
Jun Zhu
Haoxiang Wang
OffRLLRM
409
14
0
23 May 2025
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning
Guanting Dong
Yifei Chen
Xiaoxi Li
Jiajie Jin
Hongjin Qian
Yutao Zhu
Hangyu Mao
Guorui Zhou
Ji-Rong Wen
Ji-Rong Wen
LLMAGSyDaLRM
289
26
0
22 May 2025
Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models
Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models
Doohyuk Jang
Yoonjeon Kim
Chanjae Park
Hyun Ryu
Eunho Yang
LRM
157
1
0
22 May 2025
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
Kaixuan Fan
Kaituo Feng
Haoming Lyu
Dongzhan Zhou
Xiangyu Yue
ReLMLRM
270
20
0
22 May 2025
TemplateRL: Structured Template-Guided Reinforcement Learning for LLM Reasoning
TemplateRL: Structured Template-Guided Reinforcement Learning for LLM Reasoning
Jinyang Wu
Chonghua Liao
Mingkuan Feng
Shuai Zhang
Zhengqi Wen
Pengpeng Shao
Huazhe Xu
Jianhua Tao
Jianhua Tao
OffRLLRM
387
13
0
21 May 2025
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning
Shivam Agarwal
Zimin Zhang
Lifan Yuan
Jiawei Han
Yuan Yao
371
74
0
21 May 2025
AAPO: Enhancing the Reasoning Capabilities of LLMs with Advantage Momentum
AAPO: Enhancing the Reasoning Capabilities of LLMs with Advantage Momentum
Jian Xiong
Jingbo Zhou
Jingyong Ye
Qiang Huang
Dejing Dou
LRM
242
1
0
20 May 2025
SCOPE: Compress Mathematical Reasoning Steps for Efficient Automated Process Annotation
SCOPE: Compress Mathematical Reasoning Steps for Efficient Automated Process AnnotationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Huimin Xu
Xin Mao
Feng-Lin Li
Xiaobao Wu
Wang Chen
Wei Zhang
Anh Tuan Luu
152
1
0
20 May 2025
Optimizing Anytime Reasoning via Budget Relative Policy Optimization
Optimizing Anytime Reasoning via Budget Relative Policy Optimization
Penghui Qi
Zichen Liu
Tianyu Pang
Chao Du
W. Lee
Min Lin
OffRLLRM
295
11
0
19 May 2025
MARGE: Improving Math Reasoning for LLMs with Guided Exploration
MARGE: Improving Math Reasoning for LLMs with Guided Exploration
Jingyue Gao
Runji Lin
Keming Lu
Bowen Yu
Junyang Lin
Jianyu Chen
LRM
219
1
0
18 May 2025
Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Reward Design
Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Reward Design
Siliang Zeng
Quan Wei
William Brown
Oana Frunza
Oana Frunza
...
Anderson Schneider
Yuriy Nevmyvaka
Yang Katie Zhao
Alfredo García
Mingyi Hong
LRM
295
22
0
17 May 2025
Reinforcement Learning Finetunes Small Subnetworks in Large Language Models
Reinforcement Learning Finetunes Small Subnetworks in Large Language Models
Sagnik Mukherjee
Lifan Yuan
Dilek Hakkani-Tur
Yuan Yao
223
13
0
16 May 2025
SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning
SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning
Zheng Li
Qingxiu Dong
Jingyuan Ma
Di Zhang
Kai Jia
Lei Sha
LRM
328
18
0
16 May 2025
Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving
Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving
Xinji Mai
Haotian Xu
Zhong-Zhi Li
X. Wu
Weinong Wang
J. Hu
Yingying Zhang
Wenqiang Zhang
ReLMLRM
468
3
0
12 May 2025
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
Yi-Fan Zhang
Xingyu Lu
X. Hu
Chaoyou Fu
Bin Wen
...
Jianfei Chen
Fan Yang
Zheng Zhang
Yan Li
Liang Wang
OffRLLRM
272
31
0
05 May 2025
Sailing by the Stars: A Survey on Reward Models and Learning Strategies for Learning from Rewards
Sailing by the Stars: A Survey on Reward Models and Learning Strategies for Learning from Rewards
Xiaobao Wu
LRM
426
5
0
05 May 2025
RM-R1: Reward Modeling as Reasoning
RM-R1: Reward Modeling as Reasoning
Xiusi Chen
Gaotang Li
Xiping Hu
Sara Szymkuć
Cheng Qian
...
Yu Zhang
D. Zhang
Tong Zhang
Hanghang Tong
Heng Ji
ReLMOffRLLRM
714
66
0
05 May 2025
DeepCritic: Deliberate Critique with Large Language Models
DeepCritic: Deliberate Critique with Large Language Models
Wenkai Yang
Jingwen Chen
Yankai Lin
Ji-Rong Wen
ALMLRM
223
8
0
01 May 2025
Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems
Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems
Shaokun Zhang
Ming Yin
Jieyu Zhang
Jing Liu
Zhiguang Han
...
Beibin Li
Chi Wang
Hongru Wang
Yuxiao Chen
Qingyun Wu
558
32
0
30 Apr 2025
Fast-Slow Thinking GRPO for Large Vision-Language Model Reasoning
Fast-Slow Thinking GRPO for Large Vision-Language Model Reasoning
W. L. Xiao
Yaoyao Yu
ReLMLRMAI4CE
988
27
0
25 Apr 2025
Tina: Tiny Reasoning Models via LoRA
Tina: Tiny Reasoning Models via LoRA
Shangshang Wang
Julian Asilis
Ömer Faruk Akgül
Enes Burak Bilgin
Ollie Liu
Willie Neiswanger
OffRLLRM
264
15
0
22 Apr 2025
TTRL: Test-Time Reinforcement Learning
TTRL: Test-Time Reinforcement Learning
Yuxin Zuo
Kaiyan Zhang
Li Sheng
Li Sheng
Xuekai Zhu
...
Youbang Sun
Zhiyuan Ma
Lifan Yuan
Ning Ding
Bowen Zhou
OffRL
1.1K
110
0
22 Apr 2025
Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning
Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning
Jie Cheng
Ruixi Qiao
Lijun Li
Chao Guo
Chao Guo
Gang Xiong
Yisheng Lv
Fei-Yue Wang
LRM
749
18
0
21 Apr 2025
Learning to Reason under Off-Policy Guidance
Learning to Reason under Off-Policy Guidance
Jianhao Yan
Yafu Li
Zican Hu
Zhi Wang
Ganqu Cui
Xiaoye Qu
Yu Cheng
Yue Zhang
OffRLLRM
631
93
0
21 Apr 2025
Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning
Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning
Yixuan Even Xu
Yash Savani
Fei Fang
Zico Kolter
OffRL
301
36
0
18 Apr 2025
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
Xiangyan Liu
Jinjie Ni
Zijian Wu
Chao Du
Longxu Dou
Jian Shu
Tianyu Pang
Michael Shieh
OffRLLRM
1.1K
46
0
17 Apr 2025
Nemotron-CrossThink: Scaling Self-Learning beyond Math Reasoning
Nemotron-CrossThink: Scaling Self-Learning beyond Math Reasoning
Syeda Nahida Akter
Shrimai Prabhumoye
Matvei Novikov
Seungju Han
Ying Lin
...
Eric Nyberg
Yejin Choi
M. Patwary
Mohammad Shoeybi
Bryan Catanzaro
ReLMOffRLLRM
926
16
1
15 Apr 2025
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Junxiong Wang
Wen-Ding Li
Daniele Paliotta
Daniel Ritter
Alexander M. Rush
Tri Dao
LRM
264
10
0
14 Apr 2025
Efficient Process Reward Model Training via Active Learning
Efficient Process Reward Model Training via Active Learning
Keyu Duan
Zichen Liu
Xin Mao
Tianyu Pang
Changyu Chen
Qiguang Chen
Michael Shieh
Longxu Dou
LRM
167
8
0
14 Apr 2025
MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning
MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning
Zhaopeng Feng
Shaosheng Cao
Jiahan Ren
Jiayuan Su
Ruizhe Chen
Yan Zhang
Zhe Xu
Yao Hu
Jian Wu
Zuozhu Liu
ALMLRM
331
27
0
14 Apr 2025
How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients
How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients
Ming Li
Yongqian Li
Ziyue Li
Tianyi Zhou
LRM
207
7
0
14 Apr 2025
Leveraging Reasoning Model Answers to Enhance Non-Reasoning Model Capability
Leveraging Reasoning Model Answers to Enhance Non-Reasoning Model Capability
Haotian Wang
Han Zhao
Shuaiting Chen
Xiaoyu Tian
Sitong Zhao
Yunjie Ji
Yiping Peng
Xiangang Li
ReLMLRM
189
1
0
13 Apr 2025
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
Rosie Zhao
Alexandru Meterez
Sham Kakade
Cengiz Pehlevan
Samy Jelassi
Eran Malach
ReLMLRM
720
66
0
10 Apr 2025
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility
Andreas Hochlehnert
Hardik Bhatnagar
Vishaal Udandarao
Samuel Albanie
Christian Schroeder de Witt
Matthias Bethge
ReLMALMLRM
478
63
0
09 Apr 2025
Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization
Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization
Qingyang Zhang
Haitao Wu
Changqing Zhang
Peilin Zhao
Yatao Bian
ReLMLRM
539
61
0
08 Apr 2025
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
Taiwei Shi
Yiyang Wu
Linxin Song
Wanrong Zhu
Jieyu Zhao
LRM
331
48
0
07 Apr 2025
Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use
Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use
Anna Goldie
Azalia Mirhoseini
Hao Zhou
Irene Cai
Christopher D. Manning
SyDaOffRLReLMLRM
362
28
0
07 Apr 2025
MultiClear: Multimodal Soft Exoskeleton Glove for Transparent Object Grasping Assistance
MultiClear: Multimodal Soft Exoskeleton Glove for Transparent Object Grasping AssistanceTowards Autonomous Robotic Systems (TAROS), 2025
Chen Hu
Timothy Neate
Shan Luo
Letizia Gionfrida
198
40
0
04 Apr 2025
ThinkPrune: Pruning Long Chain-of-Thought of LLMs via Reinforcement Learning
ThinkPrune: Pruning Long Chain-of-Thought of LLMs via Reinforcement Learning
Bairu Hou
Yang Zhang
Jiabao Ji
Yujian Liu
Kaizhi Qian
Jacob Andreas
Shiyu Chang
OffRLLRM
238
73
0
02 Apr 2025
How Difficulty-Aware Staged Reinforcement Learning Enhances LLMs' Reasoning Capabilities: A Preliminary Experimental Study
How Difficulty-Aware Staged Reinforcement Learning Enhances LLMs' Reasoning Capabilities: A Preliminary Experimental Study
Yunjie Ji
Sitong Zhao
Xiaoyu Tian
Haotian Wang
Shuaiting Chen
Yiping Peng
Han Zhao
Xiangang Li
LRM
187
11
0
01 Apr 2025
Z1: Efficient Test-time Scaling with Code
Z1: Efficient Test-time Scaling with Code
Zhaojian Yu
Yinghao Wu
Yilun Zhao
Arman Cohan
Jinqiang Cui
LRM
277
27
0
01 Apr 2025
Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute
Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute
Jianhao Chen
Zishuo Xun
Bocheng Zhou
Han Qi
Qiaosheng Zhang
...
Wei Hu
Yuzhong Qu
W. Ouyang
Wanli Ouyang
Shuyue Hu
434
13
0
01 Apr 2025
Learning to Reason for Long-Form Story Generation
Learning to Reason for Long-Form Story Generation
Alexander Gurung
Mirella Lapata
ReLMOffRLLRM
289
14
0
28 Mar 2025
Reasoning Beyond Limits: Advances and Open Problems for LLMs
Reasoning Beyond Limits: Advances and Open Problems for LLMsICT express (ICT Express), 2025
M. Ferrag
Norbert Tihanyi
Merouane Debbah
ELMOffRLLRMAI4CE
736
13
0
26 Mar 2025
Understanding R1-Zero-Like Training: A Critical Perspective
Understanding R1-Zero-Like Training: A Critical Perspective
Zichen Liu
Changyu Chen
Wenjun Li
Penghui Qi
Tianyu Pang
Chao Du
Wee Sun Lee
Jialin Li
OffRLLRM
430
538
0
26 Mar 2025
Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators
Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators
Seungone Kim
Ian Wu
Jinu Lee
Xiang Yue
Seongyun Lee
...
Kiril Gashteovski
Carolin (Haas) Lawrence
Anjali Narayan-Chen
Graham Neubig
Sean Welleck
LRM
244
14
0
25 Mar 2025
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't
Quy-Anh Dang
Chris Ngo
OffRLLRM
285
41
0
20 Mar 2025
Aligning Multimodal LLM with Human Preference: A Survey
Aligning Multimodal LLM with Human Preference: A Survey
Tao Yu
Yujiao Shi
Chaoyou Fu
Junkang Wu
Jinda Lu
...
Qingsong Wen
Zheng Zhang
Yan Huang
Liang Wang
Tieniu Tan
721
12
0
18 Mar 2025
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Qiying Yu
Zheng Zhang
Ruofei Zhu
Yufeng Yuan
Xiaochen Zuo
...
Ya Zhang
Lin Yan
Mu Qiao
Yonghui Wu
Mingxuan Wang
OffRLLRM
504
897
0
18 Mar 2025
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
Songjun Tu
Jiahao Lin
Xiangyu Tian
Qichao Zhang
Linjing Li
...
Nan Xu
Wei He
Xiangyuan Lan
Shihong Deng
Dongbin Zhao
LRM
425
13
0
17 Mar 2025
Previous
1234
Next