ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.01456
  4. Cited By
Process Reinforcement through Implicit Rewards
v1v2 (latest)

Process Reinforcement through Implicit Rewards

3 February 2025
Ganqu Cui
Lifan Yuan
Liang Luo
Hanbin Wang
Wendi Li
Bingxiang He
Wendi Li
Tianyu Yu
Qixin Xu
Weize Chen
Qixin Xu
Huayu Chen
Kaiyan Zhang
Xingtai Lv
Kaiyan Zhang
Xingtai Lv
Xu Han
Yuan Yao
Yu Cheng
Zhiyuan Liu
Maosong Sun
Zhiyuan Liu
Ning Ding
Bowen Zhou
Ning Ding
    OffRLLRM
ArXiv (abs)PDFHTMLHuggingFace (62 upvotes)

Papers citing "Process Reinforcement through Implicit Rewards"

50 / 161 papers shown
Title
Reinforcing Action Policies by Prophesying
Reinforcing Action Policies by Prophesying
Jiahui Zhang
Ze Huang
Chun Gu
Zipei Ma
Li Zhang
108
0
0
25 Nov 2025
ST-PPO: Stabilized Off-Policy Proximal Policy Optimization for Multi-Turn Agents Training
ST-PPO: Stabilized Off-Policy Proximal Policy Optimization for Multi-Turn Agents Training
Chenliang Li
Adel Elmahdy
Alex Boyd
Zhongruo Wang
Alfredo García
Parminder Bhatia
Taha A. Kass-Hout
Cao Xiao
Mingyi Hong
OffRL
123
0
0
25 Nov 2025
P1: Mastering Physics Olympiads with Reinforcement Learning
P1: Mastering Physics Olympiads with Reinforcement Learning
Jiacheng Chen
Qianjia Cheng
F. Yu
Haiyuan Wan
Yuchen Zhang
...
Yu Cheng
Ning Ding
Bowen Zhou
Peng Ye
Ganqu Cui
ReLMLRMAI4CE
228
1
0
17 Nov 2025
Reward and Guidance through Rubrics: Promoting Exploration to Improve Multi-Domain Reasoning
Reward and Guidance through Rubrics: Promoting Exploration to Improve Multi-Domain Reasoning
Baolong Bi
Shenghua Liu
Yiwei Wang
Siqian Tong
Lingrui Mei
Yuyao Ge
Yilong Xu
Jiafeng Guo
Xueqi Cheng
OffRLLRM
164
3
0
15 Nov 2025
RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments
RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments
Zhiyuan Zeng
Hamish Ivison
Yiping Wang
Lifan Yuan
Shuyue Stella Li
...
S. Du
Natasha Jaques
Hao Peng
Pang Wei Koh
Hannaneh Hajishirzi
OffRLLRM
68
2
0
10 Nov 2025
PACR: Progressively Ascending Confidence Reward for LLM Reasoning
PACR: Progressively Ascending Confidence Reward for LLM Reasoning
Eunseop Yoon
Hee Suk Yoon
Jaehyun Jang
Soohwan Eom
Qi Dai
Chong Luo
Mark Hasegawa-Johnson
C. Yoo
LRM
98
0
0
25 Oct 2025
Language Ranker: A Lightweight Ranking framework for LLM Decoding
Language Ranker: A Lightweight Ranking framework for LLM Decoding
Chenheng Zhang
Tianqi Du
Jizhe Zhang
Mingqing Xiao
Yifei Wang
Yisen Wang
Zhouchen Lin
ALM
137
0
0
23 Oct 2025
What Defines Good Reasoning in LLMs? Dissecting Reasoning Steps with Multi-Aspect Evaluation
What Defines Good Reasoning in LLMs? Dissecting Reasoning Steps with Multi-Aspect Evaluation
Heejin Do
Jaehui Hwang
Dongyoon Han
Seong Joon Oh
Sangdoo Yun
ELMLRM
116
1
1
23 Oct 2025
CodeRL+: Improving Code Generation via Reinforcement with Execution Semantics Alignment
CodeRL+: Improving Code Generation via Reinforcement with Execution Semantics Alignment
Xue Jiang
Yihong Dong
Mengyang Liu
Hongyi Deng
Tian Wang
...
Zhi Jin
Wenpin Jiao
Fei Huang
Yongbin Li
Ge Li
81
1
0
21 Oct 2025
Can Knowledge-Graph-based Retrieval Augmented Generation Really Retrieve What You Need?
Can Knowledge-Graph-based Retrieval Augmented Generation Really Retrieve What You Need?
Junchi Yu
Y. Liu
Jindong Gu
Philip Torr
Dongzhan Zhou
RALM
159
0
0
18 Oct 2025
Cog-Rethinker: Hierarchical Metacognitive Reinforcement Learning for LLM Reasoning
Cog-Rethinker: Hierarchical Metacognitive Reinforcement Learning for LLM Reasoning
Zexu Sun
Yongcheng Zeng
Erxue Min
Heyang Gao
Bokai Ji
Xu Chen
OffRLReLMLRM
147
0
0
13 Oct 2025
Unlocking Exploration in RLVR: Uncertainty-aware Advantage Shaping for Deeper Reasoning
Unlocking Exploration in RLVR: Uncertainty-aware Advantage Shaping for Deeper Reasoning
Can Xie
Ruotong Pan
Xiangyu Wu
Y. Zhang
Jiayi Fu
Tingting Gao
G. Zhou
OffRLLRM
76
0
0
12 Oct 2025
Pinpointing crucial steps: Attribution-based Credit Assignment for Verifiable Reinforcement Learning
Pinpointing crucial steps: Attribution-based Credit Assignment for Verifiable Reinforcement Learning
Junxi Yin
Haisen Luo
Zhenyu Li
Yihua Liu
Dan Liu
Zequn Li
Xiaohang Xu
80
0
0
10 Oct 2025
Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification with Activation as Entropy Constraints
Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification with Activation as Entropy Constraints
Zilin Kang
Chonghua Liao
Tingqiang Xu
Huazhe Xu
128
1
0
09 Oct 2025
TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning
TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning
Jiaru Zou
Soumya Roy
Vinay Kumar Verma
Z. Wang
David Wipf
Pan Lu
Sumit Negi
James Zou
Jingrui He
LMTDLRM
133
0
0
07 Oct 2025
Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels
Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels
Zhepeng Cen
H. Chen
Shiyu Wang
Zuxin Liu
Zhiwei Liu
Ding Zhao
Silvio Savarese
Caiming Xiong
Huan Wang
Weiran Yao
OffRL
81
1
0
07 Oct 2025
Mitigating Forgetting Between Supervised and Reinforcement Learning Yields Stronger Reasoners
Mitigating Forgetting Between Supervised and Reinforcement Learning Yields Stronger Reasoners
Xiangchi Yuan
Xiang Chen
Tong Yu
Dachuan Shi
Can Jin
Wenke Lee
Saayan Mitra
CLLOffRLReLMLRM
98
0
0
06 Oct 2025
Aligning Perception, Reasoning, Modeling and Interaction: A Survey on Physical AI
Aligning Perception, Reasoning, Modeling and Interaction: A Survey on Physical AI
Kun Xiang
Terry Jingchen Zhang
Yinya Huang
Jixi He
Zirong Liu
...
J. N. Han
Hang Xu
Han Li
Bin Dong
Xiaodan Liang
PINNAI4CE
264
1
0
06 Oct 2025
Learning a Dense Reasoning Reward Model from Expert Demonstration via Inverse Reinforcement Learning
Learning a Dense Reasoning Reward Model from Expert Demonstration via Inverse Reinforcement Learning
Claudio Fanconi
Nicolás Astorga
M. Schaar
LRM
137
1
1
02 Oct 2025
Step-Aware Policy Optimization for Reasoning in Diffusion Large Language Models
Step-Aware Policy Optimization for Reasoning in Diffusion Large Language Models
Shaoan Xie
Lingjing Kong
Xiangchen Song
Xinshuai Dong
Guangyi Chen
Eric P.Xing
Kun Zhang
LRM
85
3
0
02 Oct 2025
Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey
Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey
Qiyuan Liu
Hao Xu
Xuhong Chen
Wei Chen
Yee Whye Teh
Ning Miao
ReLMLRMAI4CE
230
0
0
02 Oct 2025
Plan Then Action:High-Level Planning Guidance Reinforcement Learning for LLM Reasoning
Plan Then Action:High-Level Planning Guidance Reinforcement Learning for LLM Reasoning
Zhihao Dou
Qinjian Zhao
Zhongwei Wan
Dinggen Zhang
Weida Wang
...
Qingtao Pan
Yang Ouyang
Zhiqiang Gao
Shufei Zhang
Sumon Biswas
LLMAGLRM
123
2
0
02 Oct 2025
ExGRPO: Learning to Reason from Experience
ExGRPO: Learning to Reason from Experience
Runzhe Zhan
Yafu Li
Zhi Wang
Xiaoye Qu
Dongrui Liu
Jing Shao
Derek F. Wong
Yu Cheng
OffRLLRM
105
1
1
02 Oct 2025
Asymmetric Proximal Policy Optimization: mini-critics boost LLM reasoning
Asymmetric Proximal Policy Optimization: mini-critics boost LLM reasoning
Jiashun Liu
J. Obando-Ceron
Han Lu
Yancheng He
Weixun Wang
Wenbo Su
Bo Zheng
Pablo Samuel Castro
Aaron Courville
L. Pan
OffRLAAML
252
0
0
02 Oct 2025
Graph-S3: Enhancing Agentic textual Graph Retrieval with Synthetic Stepwise Supervision
Graph-S3: Enhancing Agentic textual Graph Retrieval with Synthetic Stepwise Supervision
Ge Chang
Jinbo Su
Jiacheng Liu
Pengfei Yang
Yuhao Shang
Huiwen Zheng
Hongli Ma
Yan Liang
Y. Li
Yunxin Liu
RALMLRM
86
0
0
01 Oct 2025
Atomic Thinking of LLMs: Decoupling and Exploring Mathematical Reasoning Abilities
Atomic Thinking of LLMs: Decoupling and Exploring Mathematical Reasoning Abilities
Jiayi Kuang
Haojing Huang
Yinghui Li
Xinnian Liang
Zhikun Xu
...
Xiaoyu Tan
Chao Qu
Meishan Zhang
Ying Shen
Philip S. Yu
LRM
107
4
0
30 Sep 2025
PRPO: Paragraph-level Policy Optimization for Vision-Language Deepfake Detection
PRPO: Paragraph-level Policy Optimization for Vision-Language Deepfake Detection
Tuan Nguyen
Naseem Khan
Khang Tran
Nhathai Phan
Issa M. Khalil
114
0
0
30 Sep 2025
Diversity-Incentivized Exploration for Versatile Reasoning
Diversity-Incentivized Exploration for Versatile Reasoning
Zican Hu
Shilin Zhang
Yafu Li
Jianhao Yan
Xuyang Hu
Leyang Cui
Xiaoye Qu
C. L. Philip Chen
Yu Cheng
Zhi Wang
LRM
111
2
0
30 Sep 2025
Linking Process to Outcome: Conditional Reward Modeling for LLM Reasoning
Linking Process to Outcome: Conditional Reward Modeling for LLM Reasoning
Zheng Zhang
Ziwei Shan
Kaitao Song
Yexin Li
Kan Ren
LRM
70
0
0
30 Sep 2025
Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models
Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models
Runze Liu
Jiakang Wang
Yuling Shi
Zhihui Xie
Chenxin An
...
Wenping Hu
Xiu Li
Fuzheng Zhang
Guorui Zhou
Kun Gai
OffRLLRM
98
3
0
30 Sep 2025
Hybrid Reward Normalization for Process-supervised Non-verifiable Agentic Tasks
Hybrid Reward Normalization for Process-supervised Non-verifiable Agentic Tasks
Peiran Xu
Ruoyao Xiao
Xiaoying Xing
Guannan Zhang
Debiao Li
Kunyu Shi
OffRLLRM
60
1
0
29 Sep 2025
From $f(x)$ and $g(x)$ to $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones
From f(x)f(x)f(x) and g(x)g(x)g(x) to f(g(x))f(g(x))f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones
L. Yuan
Weize Chen
Yuchen Zhang
Ganqu Cui
Hanbin Wang
Ziming You
Ning Ding
Zhiyuan Liu
Maosong Sun
Hao Peng
OffRLCLL
112
1
0
29 Sep 2025
Rethinking Reward Miscalibration of GRPO in Agentic RL
Rethinking Reward Miscalibration of GRPO in Agentic RL
Jingyu Liu
xiaopeng Wu
Jingquan Peng
Kehan Chen
Chuan Yu
Lizhong Ding
Yong Liu
116
0
0
28 Sep 2025
Variational Reasoning for Language Models
Variational Reasoning for Language Models
Xiangxin Zhou
Zichen Liu
Haonan Wang
Chao Du
Min Lin
Chongxuan Li
Liang Wang
Tianyu Pang
OffRLLRM
129
0
0
26 Sep 2025
Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning
Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning
Yulei Qin
Xiaoyu Tan
Zhengbao He
Gang Li
Haojia Lin
...
Yuzheng Cai
Xuan Zhang
Sheng Ye
Ke Li
Xing Sun
255
0
0
26 Sep 2025
Language Models Can Learn from Verbal Feedback Without Scalar Rewards
Language Models Can Learn from Verbal Feedback Without Scalar Rewards
Renjie Luo
Zichen Liu
Xiangyan Liu
Chao Du
Min Lin
Wenhu Chen
Wei Lu
Tianyu Pang
OffRL
96
2
0
26 Sep 2025
GRPO is Secretly a Process Reward Model
GRPO is Secretly a Process Reward Model
Michael Sullivan
106
0
0
25 Sep 2025
Learning GUI Grounding with Spatial Reasoning from Visual Feedback
Learning GUI Grounding with Spatial Reasoning from Visual Feedback
Yu Zhao
Wei Chen
Huseyin A. Inan
Samuel Kessler
Lu Wang
...
Fangkai Yang
Chaoyun Zhang
Pasquale Minervini
Saravan Rajmohan
Robert Sim
88
1
0
25 Sep 2025
Calibrated Reasoning: An Explanatory Verifier for Dynamic and Efficient Problem-Solving
Calibrated Reasoning: An Explanatory Verifier for Dynamic and Efficient Problem-Solving
Anisha Garg
Engin Tekin
Yash More
David Bick
Nishit Neema
Ganesh Venkatesh
LRM
68
1
0
24 Sep 2025
Agentic Reinforcement Learning with Implicit Step Rewards
Agentic Reinforcement Learning with Implicit Step Rewards
Xiaoqian Liu
Ke Wang
Yuchuan Wu
Fei Huang
Y. Li
Junge Zhang
Jianbin Jiao
OffRL
118
0
0
23 Sep 2025
SCAN: Self-Denoising Monte Carlo Annotation for Robust Process Reward Learning
SCAN: Self-Denoising Monte Carlo Annotation for Robust Process Reward Learning
Yuyang Ding
Xinyu Shi
Juntao Li
Xiaobo Liang
Zhaopeng Tu
Min Zhang
SyDa
142
2
0
20 Sep 2025
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning
Qikai Chang
Zhenrong Zhang
Pengfei Hu
Jiefeng Ma
Yicheng Pan
Jianshu Zhang
Jun Du
Quan Liu
J. Gao
OffRLLRM
88
0
0
17 Sep 2025
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
Xue Yang
Yuxin Zuo
Jiale Yu
Xicheng Zhang
Z. Yang
...
Shanghang Zhang
Y. Wang
Yao Mu
Bowen Zhou
Ning Ding
OffRLLRM
99
17
0
11 Sep 2025
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training
Chenlu Ye
Zhou Yu
Ziji Zhang
Hao Chen
Narayanan Sadagopan
Jing-Fu Huang
Tong Zhang
Anurag Beniwal
LRM
71
8
0
03 Sep 2025
Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR
Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR
Jiaming Li
Longze Chen
Ze Gong
Yukun Chen
Lu Wang
Wanwei He
Run Luo
Min Yang
LRM
53
0
0
02 Sep 2025
Know When to Explore: Difficulty-Aware Certainty as a Guide for LLM Reinforcement Learning
Know When to Explore: Difficulty-Aware Certainty as a Guide for LLM Reinforcement Learning
Ang Li
Zhihang Yuan
Yang Zhang
Shouda Liu
Yisen Wang
100
3
0
29 Aug 2025
Inference-Time Alignment Control for Diffusion Models with Reinforcement Learning Guidance
Inference-Time Alignment Control for Diffusion Models with Reinforcement Learning Guidance
Luozhijie Jin
Zijie Qiu
J. Liu
Zijie Diao
Lifeng Qiao
Ning Ding
Alex Lamb
Xipeng Qiu
AI4CE
90
2
0
28 Aug 2025
SWIRL: A Staged Workflow for Interleaved Reinforcement Learning in Mobile GUI Control
SWIRL: A Staged Workflow for Interleaved Reinforcement Learning in Mobile GUI Control
Quanfeng Lu
Zhantao Ma
Shuai Zhong
Jin Wang
Dahai Yu
Michael K. Ng
Ping Luo
144
0
0
27 Aug 2025
StepWiser: Stepwise Generative Judges for Wiser Reasoning
StepWiser: Stepwise Generative Judges for Wiser Reasoning
Wei Xiong
Wenting Zhao
Weizhe Yuan
O. Yu. Golovneva
Tong Zhang
Jason Weston
Sainbayar Sukhbaatar
LRM
68
11
0
26 Aug 2025
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Weiyun Wang
Zhangwei Gao
Lixin Gu
Hengjun Pu
Long Cui
...
Bowen Zhou
Kai Chen
Yu Qiao
Wenhai Wang
Gen Luo
MLLMLRM
210
182
0
25 Aug 2025
1234
Next