Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2305.20050
Cited By
Let's Verify Step by Step
International Conference on Learning Representations (ICLR), 2023
31 May 2023
Hunter Lightman
V. Kosaraju
Yura Burda
Harrison Edwards
Bowen Baker
Teddy Lee
Jan Leike
John Schulman
Ilya Sutskever
K. Cobbe
ALM
OffRL
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (10 upvotes)
Papers citing
"Let's Verify Step by Step"
50 / 1,389 papers shown
Title
QGraphLIME - Explaining Quantum Graph Neural Networks
Haribandhu Jena
Jyotirmaya Shivottam
Subhankar Mishra
FAtt
233
0
0
07 Oct 2025
VideoMiner: Iteratively Grounding Key Frames of Hour-Long Videos via Tree-based Group Relative Policy Optimization
Xinye Cao
Hongcan Guo
Jiawen Qian
Guoshun Nan
Chao Wang
Yuqi Pan
Tianhao Hou
X. Wang
Yutong Gao
VGen
132
0
0
07 Oct 2025
BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
Ivo Petrov
Jasper Dekoninck
Martin Vechev
126
1
0
06 Oct 2025
Where Did It All Go Wrong? A Hierarchical Look into Multi-Agent Error Attribution
Adi Banerjee
Anirudh Nair
Tarik Borogovac
LLMAG
151
1
0
06 Oct 2025
Bridging Reasoning to Learning: Unmasking Illusions using Complexity Out of Distribution Generalization
Mohammad Mahdi Samiei Paqaleh
Arash Marioriyad
Arman Tahmasebi-Zadeh
Mohamadreza Fereydooni
Mahdi Ghaznavai
Mahdieh Soleymani Baghshah
116
0
0
06 Oct 2025
Learning on the Job: Test-Time Curricula for Targeted Reinforcement Learning
Jonas Hübotter
Leander Diaz-Bone
Ido Hakimi
Andreas Krause
Moritz Hardt
147
0
0
06 Oct 2025
Test-Time Scaling in Diffusion LLMs via Hidden Semi-Autoregressive Experts
Jihoon Lee
Hoyeon Moon
Kevin Zhai
Arun Kumar Chithanar
Anit Kumar Sahu
S. Kar
Chul Lee
Souradip Chakraborty
Amrit Singh Bedi
DiffM
184
0
0
06 Oct 2025
Toward a unified framework for data-efficient evaluation of large language models
Lele Liao
Qile Zhang
Ruofan Wu
Guanhua Fang
80
0
0
05 Oct 2025
Best of mini-N in-loop Sampling: A Contextual Quality Reward Model for Reliable and Efficient Best-of-N Sampling
Hyung Gyu Rho
Sian Lee
113
0
0
05 Oct 2025
VortexPIA: Indirect Prompt Injection Attack against LLMs for Efficient Extraction of User Privacy
Yu Cui
Sicheng Pan
Yifei Liu
Haibin Zhang
Cong Zuo
125
2
0
05 Oct 2025
Unlocking Reasoning Capabilities in LLMs via Reinforcement Learning Exploration
Wenhao Deng
Long Wei
Chenglei Yu
Tailin Wu
OffRL
ReLM
LRM
237
2
0
04 Oct 2025
REG: A Regularization Optimizer for Robust Training Dynamics
Zehua Liu
Han Wu
Xiaojin Fu
Shuqi Liu
Xiongwei Han
Tao Zhong
Mingxuan Yuan
94
0
0
04 Oct 2025
Beyond Token Length: Step Pruner for Efficient and Accurate Reasoning in Large Language Models
Canhui Wu
Qiong Cao
Chang Li
Z. J. Wang
Chao Xue
Yuwei Fan
Wei Xi
Xiaodong He
LRM
120
0
0
04 Oct 2025
Reward Models are Metrics in a Trench Coat
Sebastian Gehrmann
136
0
0
03 Oct 2025
Efficient Test-Time Scaling for Small Vision-Language Models
Mehmet Onurcan Kaya
Desmond Elliott
Dim P. Papadopoulos
VLM
158
2
0
03 Oct 2025
Best-of-Majority: Minimax-Optimal Strategy for Pass@
k
k
k
Inference Scaling
Qiwei Di
Kaixuan Ji
Xuheng Li
Heyang Zhao
Quanquan Gu
100
0
0
03 Oct 2025
NCV: A Node-Wise Consistency Verification Approach for Low-Cost Structured Error Localization in LLM Reasoning
Yulong Zhang
Li Wang
Wei Du
Peilin Li
Yuqin Dai Zhiyuan Zhao
Lingyong Fang
Ziniu Liu
Ru Zhang
Huijia Zhu
Gongshen Liu
OffRL
LRM
88
0
0
03 Oct 2025
Consolidating Reinforcement Learning for Multimodal Discrete Diffusion Models
Tianren Ma
Mu Zhang
Yibing Wang
Qixiang Ye
69
1
0
03 Oct 2025
AdvEvo-MARL: Shaping Internalized Safety through Adversarial Co-Evolution in Multi-Agent Reinforcement Learning
Zhenyu Pan
Y. Zhang
Zhuo Liu
Y. Tang
Zeliang Zhang
...
Haoyang Fang
Manling Li
Chenliang Xu
Philip S. Yu
Han Liu
AAML
161
0
0
02 Oct 2025
Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey
Qiyuan Liu
Hao Xu
Xuhong Chen
Wei Chen
Yee Whye Teh
Ning Miao
ReLM
LRM
AI4CE
274
0
0
02 Oct 2025
The Reasoning Boundary Paradox: How Reinforcement Learning Constrains Language Models
Phuc Minh Nguyen
Chinh D. La
Duy M. Nguyen
Nitesh Chawla
Binh T. Nguyen
Khoa D. Doan
ReLM
LRM
779
0
1
02 Oct 2025
The Unreasonable Effectiveness of Scaling Agents for Computer Use
Gonzalo Gonzalez-Pumariega
Vincent Tu
Chih-Lun Lee
Jiachen Yang
Ang Li
Xin Eric Wang
124
3
0
02 Oct 2025
Asymmetric Proximal Policy Optimization: mini-critics boost LLM reasoning
Jiashun Liu
J. Obando-Ceron
Han Lu
Yancheng He
Weixun Wang
Wenbo Su
Bo Zheng
Pablo Samuel Castro
Aaron Courville
L. Pan
OffRL
AAML
284
0
0
02 Oct 2025
CLUE: Non-parametric Verification from Experience via Hidden-State Clustering
Zhenwen Liang
Ruosen Li
Yujun Zhou
Linfeng Song
Dian Yu
Xinya Du
Haitao Mi
Dong Yu
100
0
0
02 Oct 2025
How to Train Your Advisor: Steering Black-Box LLMs with Advisor Models
Parth Asawa
Alan Zhu
Matei A. Zaharia
A. Dimakis
Joseph E. Gonzalez
OffRL
LRM
99
1
0
02 Oct 2025
Learning a Dense Reasoning Reward Model from Expert Demonstration via Inverse Reinforcement Learning
Claudio Fanconi
Nicolás Astorga
M. Schaar
LRM
153
1
1
02 Oct 2025
ThinKV: Thought-Adaptive KV Cache Compression for Efficient Reasoning Models
Akshat Ramachandran
Marina Neseem
Charbel Sakr
Rangharajan Venkatesan
Brucek Khailany
Tushar Krishna
MQ
LRM
VLM
137
1
1
01 Oct 2025
Generalized Parallel Scaling with Interdependent Generations
Harry Dong
David Brandfonbrener
Eryk Helenowski
Yun He
Mrinal Kumar
Han Fang
Yuejie Chi
Karthik Abinav Sankararaman
LRM
132
0
0
01 Oct 2025
Simultaneous Multi-objective Alignment Across Verifiable and Non-verifiable Rewards
Yiran Shen
Yu Xia
Jonathan D. Chang
Prithviraj Ammanabrolu
128
0
0
01 Oct 2025
RiskPO: Risk-based Policy Optimization via Verifiable Reward for LLM Post-Training
Tao Ren
Jinyang Jiang
Hui Yang
Wan Tian
Minhao Zou
...
Shentao Qin
Yanjun Zhao
Rui Tao
Hui Shao
Yijie Peng
97
0
0
01 Oct 2025
Stabilizing Policy Gradients for Sample-Efficient Reinforcement Learning in LLM Reasoning
Luckeciano C. Melo
Alessandro Abate
Yarin Gal
LRM
81
0
0
01 Oct 2025
On Predictability of Reinforcement Learning Dynamics for Large Language Models
Yuchen Cai
Ding Cao
Xin Xu
Zijun Yao
Yuqing Huang
Zhenyu Tan
Benyi Zhang
Guiquan Liu
Junfeng Fang
119
0
0
01 Oct 2025
LSPO: Length-aware Dynamic Sampling for Policy Optimization in LLM Reasoning
Weizhe Chen
Sven Koenig
B. Dilkina
97
0
0
01 Oct 2025
mR3: Multilingual Rubric-Agnostic Reward Reasoning Models
David Anugraha
Shou-Yi Hung
Zilu Tang
Annie En-Shiun Lee
Derry Wijaya
Genta Indra Winata
LRM
418
2
0
01 Oct 2025
Graph-S3: Enhancing Agentic textual Graph Retrieval with Synthetic Stepwise Supervision
Ge Chang
Jinbo Su
Jiacheng Liu
Pengfei Yang
Yuhao Shang
Huiwen Zheng
Hongli Ma
Yan Liang
Y. Li
Yunxin Liu
RALM
LRM
102
0
0
01 Oct 2025
Prosperity before Collapse: How Far Can Off-Policy RL Reach with Stale Data on LLMs?
Haizhong Zheng
Jiawei Zhao
Bedi Chen
OffRL
96
2
0
01 Oct 2025
Expected Attention: KV Cache Compression by Estimating Attention from Future Queries Distribution
Alessio Devoto
Maximilian Jeblick
Simon Jégou
MQ
VLM
88
2
0
01 Oct 2025
Large Reasoning Models Learn Better Alignment from Flawed Thinking
ShengYun Peng
Eric Michael Smith
Ivan Evtimov
Song Jiang
Pin-Yu Chen
Hongyuan Zhan
Haozhu Wang
Duen Horng Chau
Mahesh Pasupuleti
Jianfeng Chi
OffRL
LRM
148
3
0
01 Oct 2025
Reinforcement Learning with Verifiable yet Noisy Rewards under Imperfect Verifiers
Xin-Qiang Cai
Wei Wang
Feng Liu
Tongliang Liu
Gang Niu
Masashi Sugiyama
OffRL
AAML
204
1
0
01 Oct 2025
RoRecomp: Enhancing Reasoning Efficiency via Rollout Response Recomposition in Reinforcement Learning
Gang Li
Yulei Qin
Xiaoyu Tan
Dingkang Yang
Yuchen Shi
Zihan Xu
Xiang Li
Xing Sun
Ke Li
OffRL
ReLM
LRM
238
0
0
30 Sep 2025
Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models
Runze Liu
Jiakang Wang
Yuling Shi
Zhihui Xie
Chenxin An
...
Wenping Hu
Xiu Li
Fuzheng Zhang
Guorui Zhou
Kun Gai
OffRL
LRM
126
3
0
30 Sep 2025
Latent Thinking Optimization: Your Latent Reasoning Language Model Secretly Encodes Reward Signals in Its Latent Thoughts
Hanwen Du
Yuxin Dong
Xia Ning
LRM
AI4CE
146
1
0
30 Sep 2025
GRPO-
λ
λ
λ
: Credit Assignment improves LLM Reasoning
Prasanna Parthasarathi
Mathieu Reymond
Boxing Chen
Yufei Cui
Sarath Chandar
LRM
137
1
1
30 Sep 2025
Learning to Ponder: Adaptive Reasoning in Latent Space
Yixin He
Lumingyuan Tang
LRM
90
1
0
29 Sep 2025
From Static to Dynamic: Adaptive Monte Carlo Search for Mathematical Process Supervision
Jie Ma
Shihao Qi
Rui Xing
Ziang Yin
Bifan Wei
Jun Liu
Tongliang Liu
AI4TS
LRM
112
0
0
29 Sep 2025
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
Fang Wu
Weihao Xuan
Heli Qi
Ximing Lu
Aaron Tu
Li Erran Li
Yejin Choi
OOD
LRM
160
0
0
29 Sep 2025
TimeOmni-1: Incentivizing Complex Reasoning with Time Series in Large Language Models
Tong Guan
Zijie Meng
Dianqi Li
Shiyu Wang
Chao-Han Huck Yang
Qingsong Wen
Zuozhu Liu
Sabato Marco Siniscalchi
Ming Jin
Shirui Pan
AI4TS
BDL
LRM
105
0
0
29 Sep 2025
Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention
Yichi Zhang
Yue Ding
Jingwen Yang
Tianwei Luo
Dongbai Li
Ranjie Duan
Qiang Liu
Hang Su
Yinpeng Dong
Jun Zhu
LRM
97
1
0
29 Sep 2025
From Perception to Cognition: A Survey of Vision-Language Interactive Reasoning in Multimodal Large Language Models
Chenyue Zhou
Mingxuan Wang
Yanbiao Ma
Chenxu Wu
Wanyi Chen
...
Guoli Jia
Lingling Li
Z. Lu
Y. Lu
Wenhan Luo
LRM
407
9
0
29 Sep 2025
World-Env: Leveraging World Model as a Virtual Environment for VLA Post-Training
Junjin Xiao
Y. Yang
Xinyuan Chang
Ronghan Chen
Feng Xiong
Mu Xu
Wei-Shi Zheng
Qing Zhang
VLM
223
6
0
29 Sep 2025
Previous
1
2
3
4
5
...
26
27
28
Next