Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2502.19613
Cited By

Self-rewarding correction for mathematical reasoning

Self-rewarding correction for mathematical reasoning

26 February 2025

ArXiv (abs)PDF HTML HuggingFace (84 upvotes)Github

Papers citing "Self-rewarding correction for mathematical reasoning"

28 / 28 papers shown

Agent0-VL: Exploring Self-Evolving Agent for Tool-Integrated Vision-Language Reasoning

Agent0-VL: Exploring Self-Evolving Agent for Tool-Integrated Vision-Language Reasoning

532

11

0

25 Nov 2025

From Solving to Verifying: A Unified Objective for Robust Reasoning in LLMs

From Solving to Verifying: A Unified Objective for Robust Reasoning in LLMs

211

3

0

19 Nov 2025

From Exploration to Exploitation: A Two-Stage Entropy RLVR Approach for Noise-Tolerant MLLM Training

From Exploration to Exploitation: A Two-Stage Entropy RLVR Approach for Noise-Tolerant MLLM Training

...

Jason Chun Lok Li

184

1

0

11 Nov 2025

What Defines Good Reasoning in LLMs? Dissecting Reasoning Steps with Multi-Aspect Evaluation

What Defines Good Reasoning in LLMs? Dissecting Reasoning Steps with Multi-Aspect Evaluation

281

3

1

23 Oct 2025

Rewarding the Journey, Not Just the Destination: A Composite Path and Answer Self-Scoring Reward Mechanism for Test-Time Reinforcement Learning

Rewarding the Journey, Not Just the Destination: A Composite Path and Answer Self-Scoring Reward Mechanism for Test-Time Reinforcement Learning

289

2

0

20 Oct 2025

Diagnosing and Mitigating System Bias in Self-Rewarding RL

Diagnosing and Mitigating System Bias in Self-Rewarding RL

...

139

0

0

10 Oct 2025

LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?

LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?

134

0

0

09 Oct 2025

Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey

Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey

333

4

0

02 Oct 2025

Understanding the Thinking Process of Reasoning Models: A Perspective from Schoenfeld's Episode Theory

Understanding the Thinking Process of Reasoning Models: A Perspective from Schoenfeld's Episode Theory

185

7

0

18 Sep 2025

Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training

Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training

Narayanan Sadagopan

192

16

0

03 Sep 2025

PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier

PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier

345

17

0

12 Jun 2025

A Survey on Large Language Models for Mathematical Reasoning

...

368

34

0

10 Jun 2025

Boosting LLM Reasoning via Spontaneous Self-Correction

Boosting LLM Reasoning via Spontaneous Self-Correction

...

298

10

0

07 Jun 2025

MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning

MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning

523

17

0

30 May 2025

Sherlock: Self-Correcting Reasoning in Vision-Language Models

Sherlock: Self-Correcting Reasoning in Vision-Language Models

375

8

0

28 May 2025

Surrogate Signals from Format and Length: Reinforcement Learning for Solving Mathematical Problems without Ground Truth Answers

Surrogate Signals from Format and Length: Reinforcement Learning for Solving Mathematical Problems without Ground Truth Answers

383

4

0

26 May 2025

Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards

Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards

418

22

0

19 May 2025

Scalable Chain of Thoughts via Elastic Reasoning

Scalable Chain of Thoughts via Elastic Reasoning

494

32

0

08 May 2025

Sailing by the Stars: A Survey on Reward Models and Learning Strategies for Learning from Rewards

Sailing by the Stars: A Survey on Reward Models and Learning Strategies for Learning from Rewards

767

5

0

05 May 2025

Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL

Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL

508

16

0

05 May 2025

Process Reward Models That Think

Process Reward Models That Think

Muhammad Khalifa

Rishabh Agarwal

Lajanugen Logeswaran

625

59

0

23 Apr 2025

Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning

Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning

1.0K

28

0

21 Apr 2025

A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce

A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce

...

530

110

0

15 Apr 2025

Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?

Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?

489

43

0

09 Apr 2025

Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization

Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization

Changqing Zhang

771

90

0

08 Apr 2025

Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation

Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation

...

583

21

0

17 Mar 2025

DPO Meets PPO: Reinforced Token Optimization for RLHF

DPO Meets PPO: Reinforced Token Optimization for RLHF

744

112

0

29 Apr 2024

Self-Rewarding Language Models

Self-Rewarding Language Models

Richard Yuanzhe Pang

Xian Li

Sainbayar Sukhbaatar

Jason Weston

ReLM SyDa ALM LRM

978

533

0

18 Jan 2024

Page 1 of 1