Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2305.20050
Cited By
Let's Verify Step by Step
International Conference on Learning Representations (ICLR), 2023
31 May 2023
Hunter Lightman
V. Kosaraju
Yura Burda
Harrison Edwards
Bowen Baker
Teddy Lee
Jan Leike
John Schulman
Ilya Sutskever
K. Cobbe
ALM
OffRL
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (10 upvotes)
Papers citing
"Let's Verify Step by Step"
50 / 1,441 papers shown
Test-Time Alignment of LLMs via Sampling-Based Optimal Control in pre-logit space
Sekitoshi Kanai
Tsukasa Yoshida
Hiroshi Takahashi
Haru Kuroki
Kazumune Hashimoto
112
0
0
30 Oct 2025
Kad: A Framework for Proxy-based Test-time Alignment with Knapsack Approximation Deferral
Ayoub Hammal
Pierre Zweigenbaum
Caio Corro
238
0
0
30 Oct 2025
The Oversight Game: Learning to Cooperatively Balance an AI Agent's Safety and Autonomy
William Overman
Mohsen Bayati
88
0
0
30 Oct 2025
Cross-Platform Evaluation of Reasoning Capabilities in Foundation Models
J. Curtò
I. D. Zarzà
Pablo García
Jordi Cabot
ELM
LRM
207
0
0
30 Oct 2025
Zero Reinforcement Learning Towards General Domains
Yuyuan Zeng
Yufei Huang
Can Xu
Qingfeng Sun
Jianfeng Yan
Guanghui Xu
Tao Yang
Fengzong Lian
OffRL
ReLM
LRM
AI4CE
162
0
0
29 Oct 2025
Reasoning-Aware GRPO using Process Mining
TaekHyun Park
Yongjae Lee
Hyerim Bae
LRM
42
0
0
29 Oct 2025
TextualVerifier: Verify TextGrad Step-by-Step
Eugenius Mario Situmorang
Adila Alfa Krisnadhi
Ari Wibisono
LRM
102
1
0
29 Oct 2025
Communication and Verification in LLM Agents towards Collaboration under Information Asymmetry
Run Peng
Ziqiao Ma
Amy Pang
Sikai Li
Zhang Xi-Jia
Yingzhuo Yu
Cristian-Paul Bara
Joyce Chai
LLMAG
137
0
0
29 Oct 2025
SymCode: A Neurosymbolic Approach to Mathematical Reasoning via Verifiable Code Generation
Sina Bagheri Nezhad
Yao Li
Ameeta Agrawal
LRM
95
1
0
29 Oct 2025
Are Language Models Efficient Reasoners? A Perspective from Logic Programming
Andreas Opedal
Yanick Zengaffinen
Haruki Shirakami
Clemente Pasti
Mrinmaya Sachan
Abulhair Saparov
Ryan Cotterell
Bernhard Schölkopf
ReLM
LRM
156
0
0
29 Oct 2025
Parrot: A Training Pipeline Enhances Both Program CoT and Natural Language CoT for Reasoning
Senjie Jin
Lu Chen
Zhiheng Xi
Yuhui Wang
Sirui Song
...
Peng Sun
Hong Lu
Tao Gui
Qi Zhang
Xuanjing Huang
ReLM
LRM
148
0
0
29 Oct 2025
A Survey on Efficient Large Language Model Training: From Data-centric Perspectives
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Junyu Luo
Bohan Wu
Xiao Luo
Zhiping Xiao
Yiqiao Jin
...
Nan Yin
Yifan Wang
Jingyang Yuan
Wei Ju
Ming Zhang
145
4
0
29 Oct 2025
Evaluating the Role of Verifiers in Test-Time Scaling for Legal Reasoning Tasks
Davide Romano
Jonathan Schwarz
Daniele Giofré
LRM
94
0
0
29 Oct 2025
Verifying Large Language Models' Reasoning Paths via Correlation Matrix Rank
Jiayu Liu
Wei Dai
Zhenya Huang
Ning Miao
Enhong Chen
LRM
90
0
0
28 Oct 2025
CritiCal: Can Critique Help LLM Uncertainty or Confidence Calibration?
Qing Zong
Jiayu Liu
Tianshi Zheng
Chunyang Li
Baixuan Xu
Haochen Shi
Weiqi Wang
Zhaowei Wang
Chunkit Chan
Yangqiu Song
104
2
0
28 Oct 2025
The Kinetics of Reasoning: How Chain-of-Thought Shapes Learning in Transformers?
Zihan Pengmei
Costas Mavromatis
Zhengyuan Shen
Yunyi Zhang
V. Ioannidis
Huzefa Rangwala
LRM
98
0
0
28 Oct 2025
SPICE: Self-Play In Corpus Environments Improves Reasoning
Bo Liu
Chuanyang Jin
Seungone Kim
Weizhe Yuan
Wenting Zhao
Ilia Kulikov
Xian Li
Sainbayar Sukhbaatar
Jack Lanchantin
Jason Weston
ReLM
LRM
237
9
0
28 Oct 2025
MASPRM: Multi-Agent System Process Reward Model
Milad Yazdani
Mahdi Mostajabdaveh
Zirui Zhou
Ying Xiong
93
0
0
28 Oct 2025
Process Reward Models for Sentence-Level Verification of LVLM Radiology Reports
Alois Thomas
M. Varma
Jean-Benoit Delbrouck
Curtis P. Langlotz
96
0
0
27 Oct 2025
PRISM-Bench: A Benchmark of Puzzle-Based Visual Tasks with CoT Error Detection
Yusu Qian
Cheng Wan
Chao Jia
Yinfei Yang
Qingyu Zhao
Zhe Gan
LRM
ReLM
507
1
0
27 Oct 2025
Smaller Models, Smarter Rewards: A Two-Sided Approach to Process and Outcome Rewards
Jan Niklas Groeneveld
Xi Qin
Alexander Schaefer
Yaad Oren
ALM
LRM
339
0
0
27 Oct 2025
Think before Recommendation: Autonomous Reasoning-enhanced Recommender
Xiaoyu Kong
Junguang Jiang
Bin Liu
Ziru Xu
Han Zhu
Jian Xu
Bo Zheng
Jiancan Wu
Xiang Wang
LRM
151
0
0
27 Oct 2025
Adaptive Blockwise Search: Inference-Time Alignment for Large Language Models
Mohammad Atif Quamar
Mohammad Areeb
Nishant Sharma
Ananth Shreekumar
Jonathan Rosenthal
Muslum Ozgur Ozmen
Mikhail Kuznetsov
Z. Berkay Celik
88
0
0
27 Oct 2025
Once Upon an Input: Reasoning via Per-Instance Program Synthesis
Adam Stein
Neelay Velingker
Mayur Naik
Eric Wong
ReLM
LRM
173
0
0
26 Oct 2025
FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning
Yuyang Ding
Chi Zhang
Juntao Li
H. Lin
Xin Liu
Min-Ling Zhang
OffRL
LRM
160
1
0
26 Oct 2025
Mapping Faithful Reasoning in Language Models
Jiazheng Li
Andreas Damianou
J Rosser
José Luis Redondo García
Konstantina Palla
LRM
103
0
0
25 Oct 2025
When Fewer Layers Break More Chains: Layer Pruning Harms Test-Time Scaling in LLMs
Keyu Wang
Tian Lyu
Guinan Su
Jonas Geiping
L. Yin
Marco Canini
Shiwei Liu
LRM
118
1
0
25 Oct 2025
Weak-to-Strong Generalization under Distribution Shifts
Myeongho Jeon
Jan Sobotka
Suhwan Choi
Maria Brbić
OOD
195
0
0
24 Oct 2025
Boosting Accuracy and Efficiency of Budget Forcing in LLMs via Reinforcement Learning for Mathematical Reasoning
Ravindra Aribowo Tarunokusumo
Rafael Fernandes Cunha
OffRL
ReLM
LRM
142
0
0
24 Oct 2025
The Universal Landscape of Human Reasoning
Qiguang Chen
Jinhao Liu
L. Qin
Yimeng Zhang
Yihao Liang
...
Mengkang Hu
Yantao Du
Z. Chen
Xie Chen
Wanxiang Che
LRM
95
1
0
24 Oct 2025
Beyond Reasoning Gains: Mitigating General Capabilities Forgetting in Large Reasoning Models
Hoang Phan
Xianjun Yang
Kevin Yao
Jingyu Zhang
Shengjie Bi
Xiaocheng Tang
Madian Khabsa
Lijuan Liu
Deren Lei
OffRL
CLL
KELM
VLM
LRM
135
0
0
24 Oct 2025
Finding the Sweet Spot: Trading Quality, Cost, and Speed During Inference-Time LLM Reflection
Jack Butler
Nikita Kozodoi
Zainab Afolabi
Brian Tyacke
Gaiar Baimuratov
102
0
0
23 Oct 2025
Self-Jailbreaking: Language Models Can Reason Themselves Out of Safety Alignment After Benign Reasoning Training
Zheng-Xin Yong
Stephen H. Bach
LRM
250
0
0
23 Oct 2025
Limits of PRM-Guided Tree Search for Mathematical Reasoning with LLMs
Tristan Cinquin
Geoff Pleiss
Agustinus Kristiadi
AIMat
LRM
243
0
0
23 Oct 2025
What Defines Good Reasoning in LLMs? Dissecting Reasoning Steps with Multi-Aspect Evaluation
Heejin Do
Jaehui Hwang
Dongyoon Han
Seong Joon Oh
Sangdoo Yun
ELM
LRM
161
1
1
23 Oct 2025
LoongRL: Reinforcement Learning for Advanced Reasoning over Long Contexts
S. S. Wang
Gaokai Zhang
Li Zhang
Ning Shang
Fan Yang
Dongyao Chen
M. Yang
OffRL
RALM
ReLM
LRM
242
0
0
22 Oct 2025
CircuitSeer: Mining High-Quality Data by Probing Mathematical Reasoning Circuits in LLMs
Shaobo Wang
Yongliang Miao
Yuancheng Liu
Qianli Ma
Ning Liao
Linfeng Zhang
LRM
165
1
0
21 Oct 2025
Activating Visual Context and Commonsense Reasoning through Masked Prediction in VLMs
Jiaao Yu
Shenwei Li
Mingjie Han
Yifei Yin
Wenzheng Song
Chenghao Jia
Man Lan
OffRL
LRM
112
0
0
21 Oct 2025
Reasoning Language Model Inference Serving Unveiled: An Empirical Study
Qi Li
Junpan Wu
Xiang Liu
Yuxin Wang
Z. Li
Zhenheng Tang
Yuhan Chen
Shaohuai Shi
Xiaowen Chu
ReLM
LRM
256
1
0
21 Oct 2025
WebDevJudge: Evaluating (M)LLMs as Critiques for Web Development Quality
Chunyang Li
Yilun Zheng
Xinting Huang
Tianqing Fang
Jiahao Xu
Yangqiu Song
L. Chen
Han Hu
ELM
118
0
0
21 Oct 2025
What Makes a Good Curriculum? Disentangling the Effects of Data Ordering on LLM Mathematical Reasoning
Yaning Jia
Chunhui Zhang
Xingjian Diao
Xiangchi Yuan
Z. Ouyang
Chiyu Ma
Soroush Vosoughi
LRM
198
1
0
21 Oct 2025
Foundational Automatic Evaluators: Scaling Multi-Task Generative Evaluator Training for Reasoning-Centric Domains
Austin Xu
Xuan-Phi Nguyen
Yilun Zhou
Chien-Sheng Wu
Caiming Xiong
Shafiq Joty
OffRL
ALM
LRM
ELM
224
0
0
20 Oct 2025
Soft-Masked Diffusion Language Models
Michael Hersche
Samuel Moor-Smith
Thomas Hofmann
Abbas Rahimi
314
1
0
20 Oct 2025
Inference-Time Compute Scaling For Flow Matching
Adam Stecklov
Noah El Rimawi-Fine
Mathieu Blanchette
116
0
0
20 Oct 2025
Fine-tuning Flow Matching Generative Models with Intermediate Feedback
Jiajun Fan
Chaoran Cheng
Shuaike Shen
Xiangxin Zhou
Ge Liu
EGVM
161
1
0
20 Oct 2025
Certified Self-Consistency: Statistical Guarantees and Test-Time Training for Reliable Reasoning in LLMs
Paula Cordero-Encinar
Andrew Duncan
LRM
196
1
0
20 Oct 2025
Visual Autoregressive Models Beat Diffusion Models on Inference Time Scaling
Erik Riise
Mehmet Onurcan Kaya
Dim P. Papadopoulos
308
0
0
19 Oct 2025
DAG-Math: Graph-Guided Mathematical Reasoning in LLMs
Yuanhe Zhang
Ilja Kuzborskij
Jason D. Lee
Chenlei Leng
Fanghui Liu
LRM
154
1
0
19 Oct 2025
A Comprehensive Survey on Reinforcement Learning-based Agentic Search: Foundations, Roles, Optimizations, Evaluations, and Applications
Minhua Lin
Zongyu Wu
Zhichao Xu
Hui Liu
Xianfeng Tang
Qi He
Charu C. Aggarwal
Hui Liu
Xiang Zhang
Suhang Wang
AI4TS
LRM
559
2
0
19 Oct 2025
Can Knowledge-Graph-based Retrieval Augmented Generation Really Retrieve What You Need?
Junchi Yu
Y. Liu
Jindong Gu
Philip Torr
Dongzhan Zhou
RALM
211
1
0
18 Oct 2025
Previous
1
2
3
4
5
6
...
27
28
29
Next