Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2305.20050
Cited By

Let's Verify Step by Step

Let's Verify Step by Step

International Conference on Learning Representations (ICLR), 2023

31 May 2023

Hunter Lightman

Harrison Edwards

ArXiv (abs)PDF HTML HuggingFace (10 upvotes)

Papers citing "Let's Verify Step by Step"

50 / 1,441 papers shown

C$^2$GSPG: Confidence-calibrated Group Sequence Policy Gradient towards Self-aware Reasoning

^2

GSPG: Confidence-calibrated Group Sequence Policy Gradient towards Self-aware Reasoning

181

0

0

24 Dec 2025

Select2Reason: Efficient Instruction-Tuning Data Selection for Long-CoT Reasoning

Select2Reason: Efficient Instruction-Tuning Data Selection for Long-CoT Reasoning

303

1

0

24 Dec 2025

Visual Reasoning Tracer: Object-Level Grounded Reasoning Benchmark

Visual Reasoning Tracer: Object-Level Grounded Reasoning Benchmark

Ming-Hsuan Yang

332

0

0

04 Dec 2025

Learning to Orchestrate Agents in Natural Language with the Conductor

Learning to Orchestrate Agents in Natural Language with the Conductor

Peter Schwendeman

96

1

0

04 Dec 2025

TRINITY: An Evolved LLM Coordinator

TRINITY: An Evolved LLM Coordinator

Peter Schwendeman

239

0

0

04 Dec 2025

On the Limits of Test-Time Compute: Sequential Reward Filtering for Better Inference

On the Limits of Test-Time Compute: Sequential Reward Filtering for Better Inference

171

0

0

04 Dec 2025

CARL: Critical Action Focused Reinforcement Learning for Multi-Step Agent

CARL: Critical Action Focused Reinforcement Learning for Multi-Step Agent

131

0

0

04 Dec 2025

A Preliminary Study on the Promises and Challenges of Native Top-$k$ Sparse Attention

A Preliminary Study on the Promises and Challenges of Native Top-

k

Sparse Attention

185

0

0

03 Dec 2025

Cross-Lingual Prompt Steerability: Towards Accurate and Robust LLM Behavior across Languages

Cross-Lingual Prompt Steerability: Towards Accurate and Robust LLM Behavior across Languages

Lajanugen Logeswaran

137

1

0

02 Dec 2025

SR-GRPO: Stable Rank as an Intrinsic Geometric Reward for Large Language Model Alignment

SR-GRPO: Stable Rank as an Intrinsic Geometric Reward for Large Language Model Alignment

156

0

0

02 Dec 2025

E-valuator: Reliable Agent Verifiers with Sequential Hypothesis Testing

E-valuator: Reliable Agent Verifiers with Sequential Hypothesis Testing

Clara Fannjiang

Gabriele Scalia

76

0

0

02 Dec 2025

Self-Improving AI Agents through Self-Play

Self-Improving AI Agents through Self-Play

Przemyslaw Chojecki

97

2

0

02 Dec 2025

When Does Verification Pay Off? A Closer Look at LLMs as Solution Verifiers

When Does Verification Pay Off? A Closer Look at LLMs as Solution Verifiers

155

0

0

02 Dec 2025

Hierarchical Process Reward Models are Symbolic Vision Learners

Hierarchical Process Reward Models are Symbolic Vision Learners

Anton van den Hengel

53

0

0

02 Dec 2025

SPARK: Stepwise Process-Aware Rewards for Reference-Free Reinforcement Learning

SPARK: Stepwise Process-Aware Rewards for Reference-Free Reinforcement Learning

Sruthi Gorantla

156

0

0

02 Dec 2025

CryptoQA: A Large-scale Question-answering Dataset for AI-assisted Cryptography

CryptoQA: A Large-scale Question-answering Dataset for AI-assisted Cryptography

Andreas Bulling

129

0

0

02 Dec 2025

Plantain: Plan-Answer Interleaved Reasoning

Plantain: Plan-Answer Interleaved Reasoning

Jonathan Berant

Abhimanyu Goyal

Kalpesh Krishna

Jacob Eisenstein

231

0

0

02 Dec 2025

Artemis: Structured Visual Reasoning for Perception Policy Learning

Artemis: Structured Visual Reasoning for Perception Policy Learning

107

0

0

01 Dec 2025

The Art of Scaling Test-Time Compute for Large Language Models

Aradhye Agarwal

Tanmoy Chakraborty

291

0

0

01 Dec 2025

Beyond SFT: Reinforcement Learning for Safer Large Reasoning Models with Better Reasoning Ability

Nathalie Baracaldo

229

0

0

01 Dec 2025

Teaching by Failure: Counter-Example-Driven Curricula for Transformer Self-Improvement

Teaching by Failure: Counter-Example-Driven Curricula for Transformer Self-Improvement

Harshil Vejendla

116

0

0

01 Dec 2025

Rectifying LLM Thought from Lens of Optimization

Rectifying LLM Thought from Lens of Optimization

121

1

0

01 Dec 2025

Optimizing Generative Ranking Relevance via Reinforcement Learning in Xiaohongshu Search

Optimizing Generative Ranking Relevance via Reinforcement Learning in Xiaohongshu Search

...

174

0

0

30 Nov 2025

SCALE: Selective Resource Allocation for Overcoming Performance Bottlenecks in Mathematical Test-time Scaling

SCALE: Selective Resource Allocation for Overcoming Performance Bottlenecks in Mathematical Test-time Scaling

185

0

0

29 Nov 2025

EDIT: Early Diffusion Inference Termination for dLLMs Based on Dynamics of Training Gradients

EDIT: Early Diffusion Inference Termination for dLLMs Based on Dynamics of Training Gradients

52

0

0

29 Nov 2025

From Illusion to Intention: Visual Rationale Learning for Vision-Language Reasoning

From Illusion to Intention: Visual Rationale Learning for Vision-Language Reasoning

312

0

0

28 Nov 2025

TIM-PRM: Verifying multimodal reasoning with Tool-Integrated PRM

TIM-PRM: Verifying multimodal reasoning with Tool-Integrated PRM

357

0

0

28 Nov 2025

Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models

Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models

247

0

0

28 Nov 2025

Adversarial Training for Process Reward Models

Adversarial Training for Process Reward Models

William Yang Wang

134

0

0

28 Nov 2025

OBLR-PO: A Theoretical Framework for Stable Reinforcement Learning

OBLR-PO: A Theoretical Framework for Stable Reinforcement Learning

97

0

0

28 Nov 2025

ITS3D: Inference-Time Scaling for Text-Guided 3D Diffusion Models

ITS3D: Inference-Time Scaling for Text-Guided 3D Diffusion Models

116

0

0

27 Nov 2025

Video Generation Models Are Good Latent Reward Models

Video Generation Models Are Good Latent Reward Models

...

345

0

0

26 Nov 2025

A Unified Evaluation-Instructed Framework for Query-Dependent Prompt Optimization

A Unified Evaluation-Instructed Framework for Query-Dependent Prompt Optimization

Hassan Almosapeeh

156

0

0

25 Nov 2025

RPM-MCTS: Knowledge-Retrieval as Process Reward Model with Monte Carlo Tree Search for Code Generation

RPM-MCTS: Knowledge-Retrieval as Process Reward Model with Monte Carlo Tree Search for Code Generation

175

0

0

25 Nov 2025

CodeV: Code with Images for Faithful Visual Reasoning via Tool-Aware Policy Optimization

CodeV: Code with Images for Faithful Visual Reasoning via Tool-Aware Policy Optimization

139

0

0

24 Nov 2025

Think Before You Prune: Selective Self-Generated Calibration for Pruning Large Reasoning Models

Think Before You Prune: Selective Self-Generated Calibration for Pruning Large Reasoning Models

108

0

0

24 Nov 2025

Majority of the Bests: Improving Best-of-N via Bootstrapping

Majority of the Bests: Improving Best-of-N via Bootstrapping

Amir-massoud Farahmand

Amir Khasahmadi

141

0

0

23 Nov 2025

Transformers with RL or SFT Provably Learn Sparse Boolean Functions, But Differently

Transformers with RL or SFT Provably Learn Sparse Boolean Functions, But Differently

143

0

0

22 Nov 2025

SPINE: Token-Selective Test-Time Reinforcement Learning with Entropy-Band Regularization

SPINE: Token-Selective Test-Time Reinforcement Learning with Entropy-Band Regularization

Daniel F. Schmidt

98

0

0

22 Nov 2025

Asking LLMs to Verify First is Almost Free Lunch

Asking LLMs to Verify First is Almost Free Lunch

154

0

0

21 Nov 2025

The PLLuM Instruction Corpus

The PLLuM Instruction Corpus

Filip Żarnecki

Konrad Kaczyñski

Zuzanna Deckert

...

Konrad Wojtasik

103

0

0

21 Nov 2025

Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs

Ali Taghibakhshi

Sharath Turuvekere Sreenivas

Saurav Muralidharan

Marcin Chochowski

...

Bryan Catanzaro

Pavlo Molchanov

96

0

0

20 Nov 2025

Cognitive Foundations for Reasoning and Their Manifestation in LLMs

Cognitive Foundations for Reasoning and Their Manifestation in LLMs

Priyanka Kargupta

Shuyue Stella Li

...

Thomas L. Griffiths

Max Kleiman-Weiner

Asli Celikyilmaz

207

2

0

20 Nov 2025

Distributed Agent Reasoning Across Independent Systems With Strict Data Locality

Kateřina Vaughan

144

0

0

20 Nov 2025

VideoSeg-R1:Reasoning Video Object Segmentation via Reinforcement Learning

238

1

0

20 Nov 2025

JudgeBoard: Benchmarking and Enhancing Small Language Models for Reasoning Evaluation

Gaurav Srivastava

267

0

0

20 Nov 2025

Incorporating Self-Rewriting into Large Language Model Reasoning Reinforcement

272

0

0

20 Nov 2025

EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control

EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control

234

1

0

19 Nov 2025

From Solving to Verifying: A Unified Objective for Robust Reasoning in LLMs

From Solving to Verifying: A Unified Objective for Robust Reasoning in LLMs

176

0

0

19 Nov 2025

Thinking, Faithful and Stable: Mitigating Hallucinations in LLMs

208

0

0

19 Nov 2025

1 2 3 4...27 28 29