Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2305.20050
Cited By
Let's Verify Step by Step
International Conference on Learning Representations (ICLR), 2023
31 May 2023
Hunter Lightman
V. Kosaraju
Yura Burda
Harrison Edwards
Bowen Baker
Teddy Lee
Jan Leike
John Schulman
Ilya Sutskever
K. Cobbe
ALM
OffRL
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (10 upvotes)
Papers citing
"Let's Verify Step by Step"
50 / 1,389 papers shown
Title
Video Generation Models Are Good Latent Reward Models
Xiaoyue Mi
W. Yu
Jiesong Lian
Shibo Jie
Ruizhe Zhong
...
Z. Zhou
Zhiyong Xu
Yuan Zhou
Qinglin Lu
Fan Tang
EGVM
VGen
145
0
0
24 Dec 2025
Select2Reason: Efficient Instruction-Tuning Data Selection for Long-CoT Reasoning
Cehao Yang
Xueyuan Lin
Chengjin Xu
Xuhui Jiang
Xiaojun Wu
Honghao Liu
Hui Xiong
Jian Guo
LRM
243
1
0
24 Dec 2025
C
2
^2
2
GSPG: Confidence-calibrated Group Sequence Policy Gradient towards Self-aware Reasoning
Haotian Liu
Shuo Wang
Hongteng Xu
LRM
101
0
0
24 Dec 2025
When Does Verification Pay Off? A Closer Look at LLMs as Solution Verifiers
Jack Lu
Ryan Teehan
Jinran Jin
Mengye Ren
LRM
84
0
0
02 Dec 2025
SR-GRPO: Stable Rank as an Intrinsic Geometric Reward for Large Language Model Alignment
Yixuan Tang
Yi Yang
ALM
104
0
0
02 Dec 2025
Cross-Lingual Prompt Steerability: Towards Accurate and Robust LLM Behavior across Languages
Lechen Zhang
Yusheng Zhou
Tolga Ergen
Lajanugen Logeswaran
Moontae Lee
David Jurgens
LRM
68
0
0
02 Dec 2025
Beyond SFT: Reinforcement Learning for Safer Large Reasoning Models with Better Reasoning Ability
Jinghan Jia
Nathalie Baracaldo
Sijia Liu
OffRL
ReLM
LRM
180
0
0
01 Dec 2025
Optimizing Generative Ranking Relevance via Reinforcement Learning in Xiaohongshu Search
Ziyang Zeng
Heming Jing
Jindong Chen
X. Li
Hongyu Liu
...
Yuqing Yang
Shaosheng Cao
Jun Fan
Yi-Chen Wu
Yao Hu
LRM
92
0
0
30 Nov 2025
Adversarial Training for Process Reward Models
Gurusha Juneja
Deepak Nathani
William Yang Wang
LRM
76
0
0
28 Nov 2025
Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models
X. S. Hu
Zhanchao Zhou
Ruiqi Liang
Zehuan Li
Wei Wu
Jianguo Li
88
0
0
28 Nov 2025
From Illusion to Intention: Visual Rationale Learning for Vision-Language Reasoning
C. Wang
Haozhe Wang
Xi Chen
J. Liu
Taofeng Xue
Chong Peng
Donglian Qi
Fangzhen Lin
Yunfeng Yan
OffRL
LRM
203
0
0
28 Nov 2025
A Unified Evaluation-Instructed Framework for Query-Dependent Prompt Optimization
Ke Chen
Yifeng Wang
Hassan Almosapeeh
Haohan Wang
144
0
0
25 Nov 2025
RPM-MCTS: Knowledge-Retrieval as Process Reward Model with Monte Carlo Tree Search for Code Generation
Yuanyuan Lin
Xiangyu Ouyang
Teng Zhang
Kaixin Sui
120
0
0
25 Nov 2025
CodeV: Code with Images for Faithful Visual Reasoning via Tool-Aware Policy Optimization
X. Hou
Shaoyuan Xu
Manan Biyani
Mayan Li
Jia-Wei Liu
Todd C. Hollon
Bryan Wang
97
0
0
24 Nov 2025
Think Before You Prune: Selective Self-Generated Calibration for Pruning Large Reasoning Models
Yang Xiang
Yixin Ji
Juntao Li
Min Zhang
LRM
80
0
0
24 Nov 2025
Majority of the Bests: Improving Best-of-N via Bootstrapping
Amin Rakhsha
Kanika Madan
Tianyu Zhang
Amir-massoud Farahmand
Amir Khasahmadi
120
0
0
23 Nov 2025
Transformers with RL or SFT Provably Learn Sparse Boolean Functions, But Differently
Bochen Lyu
Yiyang Jia
Xiaohao Cai
Zhanxing Zhu
MoE
108
0
0
22 Nov 2025
The PLLuM Instruction Corpus
Piotr Pęzik
Filip Żarnecki
Konrad Kaczyñski
A. Cichosz
Zuzanna Deckert
...
Konrad Wojtasik
Arkadiusz Janz
P. Kazienko
Julia Moska
Jan Kocoñ
76
0
0
21 Nov 2025
Asking LLMs to Verify First is Almost Free Lunch
Shiguang Wu
Quanming Yao
ReLM
LRM
116
0
0
21 Nov 2025
VideoSeg-R1:Reasoning Video Object Segmentation via Reinforcement Learning
Zishan Xu
Yifu Guo
Y. Lu
Fengyu Yang
J. Li
VOS
180
0
0
20 Nov 2025
JudgeBoard: Benchmarking and Enhancing Small Language Models for Reasoning Evaluation
Zhenyu Bi
Gaurav Srivastava
Yang Li
Meng Lu
Swastik Roy
Morteza Ziyadi
Xuan Wang
ELM
236
0
0
20 Nov 2025
Cognitive Foundations for Reasoning and Their Manifestation in LLMs
Priyanka Kargupta
Shuyue Stella Li
Haocheng Wang
Jinu Lee
Shan Chen
...
Thomas L. Griffiths
Max Kleiman-Weiner
Jiawei Han
Asli Celikyilmaz
Yulia Tsvetkov
LRM
174
0
0
20 Nov 2025
Incorporating Self-Rewriting into Large Language Model Reasoning Reinforcement
Jiashu Yao
Heyan Huang
Shuang Zeng
Chuwei Luo
Wangjie You
Jie Tang
Qingsong Liu
Yuhang Guo
Yangyang Kang
ReLM
KELM
248
0
0
20 Nov 2025
Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs
Ali Taghibakhshi
Sharath Turuvekere Sreenivas
Saurav Muralidharan
Ruisi Cai
Marcin Chochowski
...
Jan Kautz
Bryan Catanzaro
Ashwath Aithal
Nima Tajbakhsh
Pavlo Molchanov
68
0
0
20 Nov 2025
Thinking, Faithful and Stable: Mitigating Hallucinations in LLMs
Chelsea Zou
Yiheng Yao
Basant Khalil
HILM
136
0
0
19 Nov 2025
From Solving to Verifying: A Unified Objective for Robust Reasoning in LLMs
Xiaoxuan Wang
Bo Liu
Song Jiang
Jingzhou Liu
Jingyuan Qi
Xia Chen
Baosheng He
LRM
152
0
0
19 Nov 2025
Step-Audio-R1 Technical Report
Fei Tian
Xiangyu Zhang
Y. Zhang
Haoyang Zhang
Yuxin Li
...
Eng Siong Chng
Xuerui Yang
Xiangyu Zhang
Daxin Jiang
Gang Yu
AuLLM
LRM
287
0
0
19 Nov 2025
EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control
Kai Yang
Xin Xu
Yangkun Chen
Weijie Liu
Jiafei Lyu
Zichuan Lin
Deheng Ye
Saiyong Yang
205
1
0
19 Nov 2025
Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning
Mingyue Cheng
Jie Ouyang
Shuo Yu
Ruiran Yan
Yucong Luo
Zirui Liu
Daoyu Wang
Qi Liu
Enhong Chen
100
3
0
18 Nov 2025
Empowering Multi-Turn Tool-Integrated Reasoning with Group Turn Policy Optimization
Yifeng Ding
Hung Le
Songyang Han
Kangrui Ruan
Zhenghui Jin
Varun Kumar
Zijian Wang
Anoop Deoras
LRM
203
0
0
18 Nov 2025
Reasoning Shapes Alignment: Investigating Cultural Alignment in Large Reasoning Models with Cultural Norms
Yuhang Wang
Yanxu Zhu
Jitao Sang
LRM
153
0
0
17 Nov 2025
Scaling Generative Verifiers For Natural Language Mathematical Proof Verification And Selection
Sadegh Mahdavi
Branislav Kisacanin
Shubham Toshniwal
Wei Du
Ivan Moshkov
George Armstrong
Renjie Liao
Christos Thrampoulidis
Igor Gitman
ALM
LRM
209
2
0
17 Nov 2025
CriticSearch: Fine-Grained Credit Assignment for Search Agents via a Retrospective Critic
Yaocheng Zhang
Haohuan Huang
Zijun Song
Yuanheng Zhu
Qichao Zhang
Zijie Zhao
Dongbin Zhao
OffRL
LRM
124
0
0
15 Nov 2025
Reward and Guidance through Rubrics: Promoting Exploration to Improve Multi-Domain Reasoning
Baolong Bi
Shenghua Liu
Yiwei Wang
Siqian Tong
Lingrui Mei
Yuyao Ge
Yilong Xu
Jiafeng Guo
Xueqi Cheng
OffRL
LRM
196
3
0
15 Nov 2025
Better LLM Reasoning via Dual-Play
Zhengxin Zhang
Chengyu Huang
Aochong Oliver Li
Claire Cardie
OffRL
LRM
144
0
0
14 Nov 2025
Uncertainty-Guided Checkpoint Selection for Reinforcement Finetuning of Large Language Models
Manh Trong Nguyen
D. Nguyen
Dai Do
Svetha Venkatesh
Hung Le
120
0
0
13 Nov 2025
mmJEE-Eval: A Bilingual Multimodal Benchmark for Evaluating Scientific Reasoning in Vision-Language Models
Arka Mukherjee
Shreya Ghosh
LRM
156
0
0
12 Nov 2025
The Path Not Taken: RLVR Provably Learns Off the Principals
Hanqing Zhu
Zhenyu Zhang
Hanxian Huang
DiJia Su
Zechun Liu
...
Jinwon Lee
David Z. Pan
Zinan Lin
Yuandong Tian
Kai Sheng Tai
142
2
0
11 Nov 2025
Multimodal LLMs Do Not Compose Skills Optimally Across Modalities
Paula Ontalvilla
Aitor Ormazabal
Gorka Azkune
109
0
0
11 Nov 2025
AgentPRM: Process Reward Models for LLM Agents via Step-Wise Promise and Progress
Zhiheng Xi
Chenyang Liao
Guanyu Li
Y. Yang
Wenxiang Chen
...
Wei Wu
Tao Ji
Tao Gui
Qi Zhang
Xuanjing Huang
LRM
96
0
0
11 Nov 2025
DPRM: A Dual Implicit Process Reward Model in Multi-Hop Question Answering
Xinyi Wang
Yiping Song
Zhiliang Tian
Bo Liu
Tingjin Luo
Shiyu Huang
LRM
140
0
0
11 Nov 2025
Test-time Diverse Reasoning by Riemannian Activation Steering
Ly Tran Ho Khanh
Dongxuan Zhu
Man-Chung Yue
Viet Anh Nguyen
LLMSV
259
0
0
11 Nov 2025
MathSE: Improving Multimodal Mathematical Reasoning via Self-Evolving Iterative Reflection and Reward-Guided Fine-Tuning
Jinhao Chen
Zhen Yang
Jianxin Shi
Tianyu Wo
J. Tang
ReLM
LRM
224
0
0
10 Nov 2025
Textual Self-attention Network: Test-Time Preference Optimization through Textual Gradient-based Attention
Shibing Mo
Haoyang Ruan
Kai Wu
Jing Liu
157
0
0
10 Nov 2025
TinyChemVL: Advancing Chemical Vision-Language Models via Efficient Visual Token Reduction and Complex Reaction Tasks
Xuanle Zhao
Shuxin Zeng
Yinyuan Cai
Xiang Cheng
Duzhen Zhang
Xiuyi Chen
Bo Xu
124
0
0
09 Nov 2025
EASE: Practical and Efficient Safety Alignment for Small Language Models
Haonan Shi
Guoli Wang
Tu Ouyang
An Wang
LRM
181
0
0
09 Nov 2025
Confidence-Guided Stepwise Model Routing for Cost-Efficient Reasoning
Sangmook Lee
Dohyung Kim
Hyukhun Koh
Nakyeong Yang
Kyomin Jung
LRM
143
0
0
09 Nov 2025
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B
Sen Xu
Yi Zhou
Wei Wang
Jixin Min
Z. Yin
Yingwei Dai
Shixi Liu
Lianyu Pang
Yirong Chen
J. Zhang
MoE
LRM
VLM
145
0
0
09 Nov 2025
What Makes Reasoning Invalid: Echo Reflection Mitigation for Large Language Models
Chen He
Xun Jiang
Lei Wang
Hao-ran Yang
Chong Peng
Peng Yan
Fumin Shen
Xing Xu
LRM
200
0
0
09 Nov 2025
Test-Time Iterative Error Correction for Efficient Diffusion Models
Yunshan Zhong
Yanwei Qi
Yuxin Zhang
136
0
0
09 Nov 2025
1
2
3
4
...
26
27
28
Next