ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.20050
  4. Cited By
Let's Verify Step by Step

Let's Verify Step by Step

International Conference on Learning Representations (ICLR), 2023
31 May 2023
Hunter Lightman
V. Kosaraju
Yura Burda
Harrison Edwards
Bowen Baker
Teddy Lee
Jan Leike
John Schulman
Ilya Sutskever
K. Cobbe
    ALMOffRLLRM
ArXiv (abs)PDFHTMLHuggingFace (10 upvotes)

Papers citing "Let's Verify Step by Step"

50 / 1,441 papers shown
C$^2$GSPG: Confidence-calibrated Group Sequence Policy Gradient towards Self-aware Reasoning
C2^22GSPG: Confidence-calibrated Group Sequence Policy Gradient towards Self-aware Reasoning
Haotian Liu
Shuo Wang
Hongteng Xu
LRM
181
0
0
24 Dec 2025
Select2Reason: Efficient Instruction-Tuning Data Selection for Long-CoT Reasoning
Select2Reason: Efficient Instruction-Tuning Data Selection for Long-CoT Reasoning
Cehao Yang
Xueyuan Lin
Chengjin Xu
Xuhui Jiang
Xiaojun Wu
Honghao Liu
Hui Xiong
Jian Guo
LRM
303
1
0
24 Dec 2025
Visual Reasoning Tracer: Object-Level Grounded Reasoning Benchmark
Visual Reasoning Tracer: Object-Level Grounded Reasoning Benchmark
Haobo Yuan
Yueyi Sun
Yanwei Li
Tao Zhang
XueQing Deng
Henghui Ding
Lu Qi
Anran Wang
X. Li
Ming-Hsuan Yang
ReLMLRM
332
0
0
04 Dec 2025
Learning to Orchestrate Agents in Natural Language with the Conductor
Learning to Orchestrate Agents in Natural Language with the Conductor
Stefan Nielsen
Edoardo Cetin
Peter Schwendeman
Qi Sun
Jinglue Xu
Yujin Tang
LLMAG
96
1
0
04 Dec 2025
TRINITY: An Evolved LLM Coordinator
TRINITY: An Evolved LLM Coordinator
Jinglue Xu
Qi Sun
Peter Schwendeman
Stefan Nielsen
Edoardo Cetin
Yujin Tang
LLMAG
239
0
0
04 Dec 2025
On the Limits of Test-Time Compute: Sequential Reward Filtering for Better Inference
On the Limits of Test-Time Compute: Sequential Reward Filtering for Better Inference
Yue Yu
Qiwei Di
Quanquan Gu
Dongruo Zhou
BDL
171
0
0
04 Dec 2025
CARL: Critical Action Focused Reinforcement Learning for Multi-Step Agent
CARL: Critical Action Focused Reinforcement Learning for Multi-Step Agent
Leyang Shen
Y. Zhang
Chun Kai Ling
Xiaoyan Zhao
Tat-Seng Chua
131
0
0
04 Dec 2025
A Preliminary Study on the Promises and Challenges of Native Top-$k$ Sparse Attention
A Preliminary Study on the Promises and Challenges of Native Top-kkk Sparse Attention
Di Xiu
Hongyin Tang
Bolin Rong
Lizhi Yan
Jingang Wang
Yifan Lu
Xunliang Cai
185
0
0
03 Dec 2025
Cross-Lingual Prompt Steerability: Towards Accurate and Robust LLM Behavior across Languages
Cross-Lingual Prompt Steerability: Towards Accurate and Robust LLM Behavior across Languages
Lechen Zhang
Yusheng Zhou
Tolga Ergen
Lajanugen Logeswaran
Moontae Lee
David Jurgens
LRM
137
1
0
02 Dec 2025
SR-GRPO: Stable Rank as an Intrinsic Geometric Reward for Large Language Model Alignment
SR-GRPO: Stable Rank as an Intrinsic Geometric Reward for Large Language Model Alignment
Yixuan Tang
Yi Yang
ALM
156
0
0
02 Dec 2025
E-valuator: Reliable Agent Verifiers with Sequential Hypothesis Testing
E-valuator: Reliable Agent Verifiers with Sequential Hypothesis Testing
Shuvom Sadhuka
Drew Prinster
Clara Fannjiang
Gabriele Scalia
Aviv Regev
Hanchen Wang
76
0
0
02 Dec 2025
Self-Improving AI Agents through Self-Play
Self-Improving AI Agents through Self-Play
Przemyslaw Chojecki
97
2
0
02 Dec 2025
When Does Verification Pay Off? A Closer Look at LLMs as Solution Verifiers
When Does Verification Pay Off? A Closer Look at LLMs as Solution Verifiers
Jack Lu
Ryan Teehan
Jinran Jin
Mengye Ren
LRM
155
0
0
02 Dec 2025
Hierarchical Process Reward Models are Symbolic Vision Learners
Hierarchical Process Reward Models are Symbolic Vision Learners
Shan Zhang
Aotian Chen
Kai Zou
Jindong Gu
Yuan Xue
Anton van den Hengel
53
0
0
02 Dec 2025
SPARK: Stepwise Process-Aware Rewards for Reference-Free Reinforcement Learning
SPARK: Stepwise Process-Aware Rewards for Reference-Free Reinforcement Learning
Salman Rahman
Sruthi Gorantla
Arpit Gupta
Swastik Roy
Nanyun Peng
Yang Liu
OffRLLRM
156
0
0
02 Dec 2025
CryptoQA: A Large-scale Question-answering Dataset for AI-assisted Cryptography
CryptoQA: A Large-scale Question-answering Dataset for AI-assisted Cryptography
Mayar Elfares
Pascal Reisert
Tilman Dietz
Manpa Barman
Ahmed Zaki
Ralf Küsters
Andreas Bulling
ELM
129
0
0
02 Dec 2025
Plantain: Plan-Answer Interleaved Reasoning
Plantain: Plan-Answer Interleaved Reasoning
Anthony Liang
Jonathan Berant
Adam Fisch
Abhimanyu Goyal
Kalpesh Krishna
Jacob Eisenstein
ReLMLRM
231
0
0
02 Dec 2025
Artemis: Structured Visual Reasoning for Perception Policy Learning
Artemis: Structured Visual Reasoning for Perception Policy Learning
Wei Tang
Yanpeng Sun
Shan Zhang
Xiaofan Li
Piotr Koniusz
Wei Li
Na Zhao
Z. Li
LRMVLM
107
0
0
01 Dec 2025
The Art of Scaling Test-Time Compute for Large Language Models
Aradhye Agarwal
Ayan Sengupta
Tanmoy Chakraborty
LRM
291
0
0
01 Dec 2025
Beyond SFT: Reinforcement Learning for Safer Large Reasoning Models with Better Reasoning Ability
Jinghan Jia
Nathalie Baracaldo
Sijia Liu
OffRLReLMLRM
229
0
0
01 Dec 2025
Teaching by Failure: Counter-Example-Driven Curricula for Transformer Self-Improvement
Teaching by Failure: Counter-Example-Driven Curricula for Transformer Self-Improvement
Harshil Vejendla
116
0
0
01 Dec 2025
Rectifying LLM Thought from Lens of Optimization
Rectifying LLM Thought from Lens of Optimization
J. Liu
Hongwei Liu
Songyang Zhang
Kai Chen
LRM
121
1
0
01 Dec 2025
Optimizing Generative Ranking Relevance via Reinforcement Learning in Xiaohongshu Search
Optimizing Generative Ranking Relevance via Reinforcement Learning in Xiaohongshu Search
Ziyang Zeng
Heming Jing
Jindong Chen
X. Li
Hongyu Liu
...
Yuqing Yang
Shaosheng Cao
Jun Fan
Yi-Chen Wu
Yao Hu
LRM
174
0
0
30 Nov 2025
SCALE: Selective Resource Allocation for Overcoming Performance Bottlenecks in Mathematical Test-time Scaling
SCALE: Selective Resource Allocation for Overcoming Performance Bottlenecks in Mathematical Test-time Scaling
Yang Xiao
Chunpu Xu
Ruifeng Yuan
Jiashuo Wang
Wenjie Li
Pengfei Liu
LRM
185
0
0
29 Nov 2025
EDIT: Early Diffusion Inference Termination for dLLMs Based on Dynamics of Training Gradients
EDIT: Early Diffusion Inference Termination for dLLMs Based on Dynamics of Training Gradients
He-Yen Hsieh
Hong Wang
H. T. Kung
52
0
0
29 Nov 2025
From Illusion to Intention: Visual Rationale Learning for Vision-Language Reasoning
From Illusion to Intention: Visual Rationale Learning for Vision-Language Reasoning
C. Wang
Haozhe Wang
Xi Chen
J. Liu
Taofeng Xue
Chong Peng
Donglian Qi
Fangzhen Lin
Yunfeng Yan
OffRLLRM
312
0
0
28 Nov 2025
TIM-PRM: Verifying multimodal reasoning with Tool-Integrated PRM
TIM-PRM: Verifying multimodal reasoning with Tool-Integrated PRM
Peng Kuang
X. Wang
Wentao Liu
Jian Dong
Kaidi Xu
MULRM
357
0
0
28 Nov 2025
Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models
Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models
X. S. Hu
Zhanchao Zhou
Ruiqi Liang
Zehuan Li
Wei Wu
Jianguo Li
247
0
0
28 Nov 2025
Adversarial Training for Process Reward Models
Adversarial Training for Process Reward Models
Gurusha Juneja
Deepak Nathani
William Yang Wang
LRM
134
0
0
28 Nov 2025
OBLR-PO: A Theoretical Framework for Stable Reinforcement Learning
OBLR-PO: A Theoretical Framework for Stable Reinforcement Learning
Zixun Huang
Jiayi Sheng
Zeyu Zheng
OffRL
97
0
0
28 Nov 2025
ITS3D: Inference-Time Scaling for Text-Guided 3D Diffusion Models
ITS3D: Inference-Time Scaling for Text-Guided 3D Diffusion Models
Zhenglin Zhou
Fan Ma
Xiaobo Xia
Hehe Fan
Yi Yang
Tat-Seng Chua
DiffM3DGS
116
0
0
27 Nov 2025
Video Generation Models Are Good Latent Reward Models
Video Generation Models Are Good Latent Reward Models
Xiaoyue Mi
W. Yu
Jiesong Lian
Shibo Jie
Ruizhe Zhong
...
Z. Zhou
Zhiyong Xu
Yuan Zhou
Qinglin Lu
Fan Tang
EGVMVGen
345
0
0
26 Nov 2025
A Unified Evaluation-Instructed Framework for Query-Dependent Prompt Optimization
A Unified Evaluation-Instructed Framework for Query-Dependent Prompt Optimization
Ke Chen
Yifeng Wang
Hassan Almosapeeh
Haohan Wang
156
0
0
25 Nov 2025
RPM-MCTS: Knowledge-Retrieval as Process Reward Model with Monte Carlo Tree Search for Code Generation
RPM-MCTS: Knowledge-Retrieval as Process Reward Model with Monte Carlo Tree Search for Code Generation
Yuanyuan Lin
Xiangyu Ouyang
Teng Zhang
Kaixin Sui
175
0
0
25 Nov 2025
CodeV: Code with Images for Faithful Visual Reasoning via Tool-Aware Policy Optimization
CodeV: Code with Images for Faithful Visual Reasoning via Tool-Aware Policy Optimization
X. Hou
Shaoyuan Xu
Manan Biyani
Mayan Li
Jia-Wei Liu
Todd C. Hollon
Bryan Wang
139
0
0
24 Nov 2025
Think Before You Prune: Selective Self-Generated Calibration for Pruning Large Reasoning Models
Think Before You Prune: Selective Self-Generated Calibration for Pruning Large Reasoning Models
Yang Xiang
Yixin Ji
Juntao Li
Min Zhang
LRM
108
0
0
24 Nov 2025
Majority of the Bests: Improving Best-of-N via Bootstrapping
Majority of the Bests: Improving Best-of-N via Bootstrapping
Amin Rakhsha
Kanika Madan
Tianyu Zhang
Amir-massoud Farahmand
Amir Khasahmadi
141
0
0
23 Nov 2025
Transformers with RL or SFT Provably Learn Sparse Boolean Functions, But Differently
Transformers with RL or SFT Provably Learn Sparse Boolean Functions, But Differently
Bochen Lyu
Yiyang Jia
Xiaohao Cai
Zhanxing Zhu
MoE
143
0
0
22 Nov 2025
SPINE: Token-Selective Test-Time Reinforcement Learning with Entropy-Band Regularization
SPINE: Token-Selective Test-Time Reinforcement Learning with Entropy-Band Regularization
Jianghao Wu
Yasmeen George
Jin Ye
Y. Wu
Daniel F. Schmidt
Jianfei Cai
LRM
98
0
0
22 Nov 2025
Asking LLMs to Verify First is Almost Free Lunch
Asking LLMs to Verify First is Almost Free Lunch
Shiguang Wu
Quanming Yao
ReLMLRM
154
0
0
21 Nov 2025
The PLLuM Instruction Corpus
The PLLuM Instruction Corpus
Piotr Pęzik
Filip Żarnecki
Konrad Kaczyñski
A. Cichosz
Zuzanna Deckert
...
Konrad Wojtasik
Arkadiusz Janz
P. Kazienko
Julia Moska
Jan Kocoñ
103
0
0
21 Nov 2025
Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs
Ali Taghibakhshi
Sharath Turuvekere Sreenivas
Saurav Muralidharan
Ruisi Cai
Marcin Chochowski
...
Jan Kautz
Bryan Catanzaro
Ashwath Aithal
Nima Tajbakhsh
Pavlo Molchanov
96
0
0
20 Nov 2025
Cognitive Foundations for Reasoning and Their Manifestation in LLMs
Cognitive Foundations for Reasoning and Their Manifestation in LLMs
Priyanka Kargupta
Shuyue Stella Li
Haocheng Wang
Jinu Lee
Shan Chen
...
Thomas L. Griffiths
Max Kleiman-Weiner
Jiawei Han
Asli Celikyilmaz
Yulia Tsvetkov
LRM
207
2
0
20 Nov 2025
Distributed Agent Reasoning Across Independent Systems With Strict Data Locality
Daniel Vaughan
Kateřina Vaughan
144
0
0
20 Nov 2025
VideoSeg-R1:Reasoning Video Object Segmentation via Reinforcement Learning
Zishan Xu
Yifu Guo
Y. Lu
Fengyu Yang
J. Li
VOS
238
1
0
20 Nov 2025
JudgeBoard: Benchmarking and Enhancing Small Language Models for Reasoning Evaluation
Zhenyu Bi
Gaurav Srivastava
Yang Li
Meng Lu
Swastik Roy
Morteza Ziyadi
Xuan Wang
ELM
267
0
0
20 Nov 2025
Incorporating Self-Rewriting into Large Language Model Reasoning Reinforcement
Jiashu Yao
Heyan Huang
Shuang Zeng
Chuwei Luo
Wangjie You
Jie Tang
Qingsong Liu
Yuhang Guo
Yangyang Kang
ReLMKELM
272
0
0
20 Nov 2025
EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control
EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control
Kai Yang
Xin Xu
Yangkun Chen
Weijie Liu
Jiafei Lyu
Zichuan Lin
Deheng Ye
Saiyong Yang
234
1
0
19 Nov 2025
From Solving to Verifying: A Unified Objective for Robust Reasoning in LLMs
From Solving to Verifying: A Unified Objective for Robust Reasoning in LLMs
Xiaoxuan Wang
Bo Liu
Song Jiang
Jingzhou Liu
Jingyuan Qi
Xia Chen
Baosheng He
LRM
176
0
0
19 Nov 2025
Thinking, Faithful and Stable: Mitigating Hallucinations in LLMs
Chelsea Zou
Yiheng Yao
Basant Khalil
HILM
208
0
0
19 Nov 2025
1234...272829
Next