ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.19613
  4. Cited By
Self-rewarding correction for mathematical reasoning

Self-rewarding correction for mathematical reasoning

26 February 2025
Wei Xiong
Hanning Zhang
Chenlu Ye
Lichang Chen
Nan Jiang
Tong Zhang
    ReLMKELMLRM
ArXiv (abs)PDFHTMLHuggingFace (84 upvotes)Github

Papers citing "Self-rewarding correction for mathematical reasoning"

28 / 28 papers shown
Agent0-VL: Exploring Self-Evolving Agent for Tool-Integrated Vision-Language Reasoning
Agent0-VL: Exploring Self-Evolving Agent for Tool-Integrated Vision-Language Reasoning
Jiaqi Liu
Kaiwen Xiong
Peng Xia
Yiyang Zhou
Haonian Ji
Lu Feng
S. Han
Mingyu Ding
Huaxiu Yao
LLMAGLRMVLM
532
11
0
25 Nov 2025
From Solving to Verifying: A Unified Objective for Robust Reasoning in LLMs
From Solving to Verifying: A Unified Objective for Robust Reasoning in LLMs
Xiaoxuan Wang
Bo Liu
Song Jiang
Jingzhou Liu
Jingyuan Qi
Xia Chen
Baosheng He
LRM
211
3
0
19 Nov 2025
From Exploration to Exploitation: A Two-Stage Entropy RLVR Approach for Noise-Tolerant MLLM Training
From Exploration to Exploitation: A Two-Stage Entropy RLVR Approach for Noise-Tolerant MLLM Training
Donglai Xu
Hongzheng Yang
Yuzhi Zhao
Pingping Zhang
Jinpeng Chen
...
Xiaolei Li
Senkang Hu
Ziyi Guan
Jason Chun Lok Li
L. Po
NoLa
184
1
0
11 Nov 2025
What Defines Good Reasoning in LLMs? Dissecting Reasoning Steps with Multi-Aspect Evaluation
What Defines Good Reasoning in LLMs? Dissecting Reasoning Steps with Multi-Aspect Evaluation
Heejin Do
Jaehui Hwang
Dongyoon Han
Seong Joon Oh
Sangdoo Yun
ELMLRM
281
3
1
23 Oct 2025
Rewarding the Journey, Not Just the Destination: A Composite Path and Answer Self-Scoring Reward Mechanism for Test-Time Reinforcement Learning
Rewarding the Journey, Not Just the Destination: A Composite Path and Answer Self-Scoring Reward Mechanism for Test-Time Reinforcement Learning
Chenwei Tang
Jingyu Xing
Xinyu Liu
Wei Ju
Jiancheng Lv
Fan Zhang
Deng Xiong
Ziyue Qiao
LRM
289
2
0
20 Oct 2025
Diagnosing and Mitigating System Bias in Self-Rewarding RL
Diagnosing and Mitigating System Bias in Self-Rewarding RL
Chuyi Tan
Peiwen Yuan
Xinglin Wang
Yiwei Li
Shaoxiong Feng
...
Jiayi Shi
Ji Zhang
Boyuan Pan
Yao Hu
Kan Li
139
0
0
10 Oct 2025
LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?
LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?
Jingyuan Wang
Yankai Chen
Zhonghang Li
Chao Huang
LRM
134
0
0
09 Oct 2025
Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey
Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey
Qiyuan Liu
Hao Xu
Xuhong Chen
Wei Chen
Yee Whye Teh
Ning Miao
ReLMLRMAI4CE
333
4
0
02 Oct 2025
Understanding the Thinking Process of Reasoning Models: A Perspective from Schoenfeld's Episode Theory
Understanding the Thinking Process of Reasoning Models: A Perspective from Schoenfeld's Episode Theory
Ming Li
Nan Zhang
Chenrui Fan
Hong Jiao
Yanbin Fu
Sydney Peters
Qingshu Xu
Robert Lissitz
Tianyi Zhou
LRM
185
7
0
18 Sep 2025
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training
Chenlu Ye
Zhou Yu
Ziji Zhang
Hao Chen
Narayanan Sadagopan
Jing-Fu Huang
Tong Zhang
Anurag Beniwal
LRM
192
16
0
03 Sep 2025
PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier
PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier
Y. Jiang
Yuwen Xiong
Yufeng Yuan
Chao Xin
Wenyuan Xu
Yu Yue
Qianchuan Zhao
Lin Yan
LRM
345
17
0
12 Jun 2025
A Survey on Large Language Models for Mathematical Reasoning
Peng-Yuan Wang
Tian-Shuo Liu
Chenyang Wang
Yi-Di Wang
Shu Yan
...
Xu-Hui Liu
Xin-Wei Chen
Jia-Cheng Xu
Ziniu Li
Yang Yu
LRM
368
34
0
10 Jun 2025
Boosting LLM Reasoning via Spontaneous Self-Correction
Boosting LLM Reasoning via Spontaneous Self-Correction
Xutong Zhao
Tengyu Xu
Xuewei Wang
Zhengxing Chen
Di Jin
...
Yun He
Sinong Wang
Han Fang
Sarath Chandar
Chen Zhu
ReLMLRMKELM
298
10
0
07 Jun 2025
MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning
MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning
Yiqing Liang
Jielin Qiu
Wenhao Ding
Zuxin Liu
James Tompkin
Mengdi Xu
Mengzhou Xia
Zhengzhong Tu
Laixi Shi
Jiacheng Zhu
OffRL
523
17
0
30 May 2025
Sherlock: Self-Correcting Reasoning in Vision-Language Models
Sherlock: Self-Correcting Reasoning in Vision-Language Models
Yi Ding
Ruqi Zhang
ReLMLRMVLM
375
8
0
28 May 2025
Surrogate Signals from Format and Length: Reinforcement Learning for Solving Mathematical Problems without Ground Truth Answers
Surrogate Signals from Format and Length: Reinforcement Learning for Solving Mathematical Problems without Ground Truth Answers
Rihui Xin
Han Liu
Zecheng Wang
Yupeng Zhang
Dianbo Sui
Xiaolin Hu
Bingning Wang
SyDa
383
4
0
26 May 2025
Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards
Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards
Xiaoyuan Liu
Tian Liang
Zhiwei He
Jiahao Xu
Wenxuan Wang
Pinjia He
Zhaopeng Tu
Haitao Mi
Dong Yu
OffRLReLMLRM
418
22
0
19 May 2025
Scalable Chain of Thoughts via Elastic Reasoning
Scalable Chain of Thoughts via Elastic Reasoning
Yuhui Xu
Hanze Dong
Lei Wang
Doyen Sahoo
Junnan Li
Caiming Xiong
OffRLLRM
494
32
0
08 May 2025
Sailing by the Stars: A Survey on Reward Models and Learning Strategies for Learning from Rewards
Sailing by the Stars: A Survey on Reward Models and Learning Strategies for Learning from Rewards
Xiaobao Wu
LRM
767
5
0
05 May 2025
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
Jiarui Yao
Yifan Hao
Hanning Zhang
Hanze Dong
Wei Xiong
Nan Jiang
Tong Zhang
LRM
508
16
0
05 May 2025
Process Reward Models That Think
Process Reward Models That Think
Muhammad Khalifa
Rishabh Agarwal
Lajanugen Logeswaran
Jaekyeom Kim
Hao Peng
Moontae Lee
Honglak Lee
Lu Wang
OffRLALMLRM
625
59
0
23 Apr 2025
Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning
Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning
Jie Cheng
Ruixi Qiao
Lijun Li
Chao Guo
Chao Guo
Gang Xiong
Yisheng Lv
Fei-Yue Wang
LRM
1.0K
28
0
21 Apr 2025
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
Wei Xiong
Jiarui Yao
Yuhui Xu
Bo Pang
Lei Wang
...
Junnan Li
Nan Jiang
Tong Zhang
Caiming Xiong
Hanze Dong
OffRLLRM
530
110
0
15 Apr 2025
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?
Chenrui Fan
Ming Li
Lichao Sun
Tianyi Zhou
LRM
489
43
0
09 Apr 2025
Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization
Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization
Qingyang Zhang
Haitao Wu
Changqing Zhang
Peilin Zhao
Yatao Bian
ReLMLRM
771
90
0
08 Apr 2025
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
Songjun Tu
Jiahao Lin
Xiangyu Tian
Qichao Zhang
Linjing Li
...
Nan Xu
Wei He
Xiangyuan Lan
Shihong Deng
Dongbin Zhao
LRM
583
21
0
17 Mar 2025
DPO Meets PPO: Reinforced Token Optimization for RLHF
DPO Meets PPO: Reinforced Token Optimization for RLHF
Han Zhong
Zikang Shan
Guhao Feng
Wei Xiong
Xinle Cheng
Li Zhao
Di He
Jiang Bian
Liwei Wang
744
112
0
29 Apr 2024
Self-Rewarding Language Models
Self-Rewarding Language Models
Weizhe Yuan
Richard Yuanzhe Pang
Kyunghyun Cho
Xian Li
Sainbayar Sukhbaatar
Jing Xu
Jason Weston
ReLMSyDaALMLRM
978
533
0
18 Jan 2024
1
Page 1 of 1