Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2305.20050
Cited By
Let's Verify Step by Step
International Conference on Learning Representations (ICLR), 2023
31 May 2023
Hunter Lightman
V. Kosaraju
Yura Burda
Harrison Edwards
Bowen Baker
Teddy Lee
Jan Leike
John Schulman
Ilya Sutskever
K. Cobbe
ALM
OffRL
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (10 upvotes)
Papers citing
"Let's Verify Step by Step"
50 / 1,389 papers shown
Title
Hybrid Reward Normalization for Process-supervised Non-verifiable Agentic Tasks
Peiran Xu
Ruoyao Xiao
Xiaoying Xing
Guannan Zhang
Debiao Li
Kunyu Shi
OffRL
LRM
96
1
0
29 Sep 2025
Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention
Yichi Zhang
Yue Ding
Jingwen Yang
Tianwei Luo
Dongbai Li
Ranjie Duan
Qiang Liu
Hang Su
Yinpeng Dong
Jun Zhu
LRM
97
1
0
29 Sep 2025
RFG: Test-Time Scaling for Diffusion Large Language Model Reasoning with Reward-Free Guidance
Tianlang Chen
Minkai Xu
Jure Leskovec
Stefano Ermon
LRM
AI4CE
122
2
0
29 Sep 2025
World-Env: Leveraging World Model as a Virtual Environment for VLA Post-Training
Junjin Xiao
Y. Yang
Xinyuan Chang
Ronghan Chen
Feng Xiong
Mu Xu
Wei-Shi Zheng
Qing Zhang
VLM
223
6
0
29 Sep 2025
TimeOmni-1: Incentivizing Complex Reasoning with Time Series in Large Language Models
Tong Guan
Zijie Meng
Dianqi Li
Shiyu Wang
Chao-Han Huck Yang
Qingsong Wen
Zuozhu Liu
Sabato Marco Siniscalchi
Ming Jin
Shirui Pan
AI4TS
BDL
LRM
105
0
0
29 Sep 2025
ContextPRM: Leveraging Contextual Coherence for multi-domain Test-Time Scaling
Haotian Zhang
Liu Liu
B. Yu
Jiayan Qiu
Likang Xiao
Yanwei Ren
Quan Chen
Xianglong Liu
LRM
84
0
0
29 Sep 2025
Taming Masked Diffusion Language Models via Consistency Trajectory Reinforcement Learning with Fewer Decoding Step
Jingyi Yang
Guanxu Chen
Xuhao Hu
Jing Shao
152
3
0
28 Sep 2025
Large-Scale Constraint Generation - Can LLMs Parse Hundreds of Constraints?
Matteo Boffa
Jiaxuan You
152
0
0
28 Sep 2025
From Reasoning to Answer: Empirical, Attention-Based and Mechanistic Insights into Distilled DeepSeek R1 Models
Jue Zhang
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
LRM
101
0
0
28 Sep 2025
Reasoning Scaffolding: Distilling the Flow of Thought from LLMs
Xiangyu Wen
Junhua Huang
Zeju Li
Min Li
Jianyuan Zhong
Zhijian Xu
Mingxuan Yuan
Yongxiang Huang
Q. Xu
LRM
118
0
0
28 Sep 2025
Toward Effective Tool-Integrated Reasoning via Self-Evolved Preference Learning
Yifei Chen
Guanting Dong
Zhicheng Dou
LRM
191
2
0
27 Sep 2025
d
2
^2
2
Cache: Accelerating Diffusion-Based LLMs via Dual Adaptive Caching
Yuchu Jiang
Yue Cai
Xiangzhong Luo
Jiale Fu
Jiarui Wang
Chonghan Liu
Xu Yang
84
6
0
27 Sep 2025
General Exploratory Bonus for Optimistic Exploration in RLHF
W. Li
Changdae Oh
Yixuan Li
AI4CE
261
0
0
27 Sep 2025
Training Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons Learned
Brandon Ong
Tej Deep Pala
Vernon Y.H. Toh
William-Chandra Tjhi
Soujanya Poria
LRM
VLM
139
1
0
27 Sep 2025
Your Models Have Thought Enough: Training Large Reasoning Models to Stop Overthinking
Jinyi Han
Ying Huang
Ying Liao
Zishang Jiang
Xikun Lu
...
Jiaqing Liang
Weikang Zhou
Zeye Sun
Fei Yu
Yanghua Xiao
OffRL
LRM
146
0
0
27 Sep 2025
Learning to Reason in Structured In-context Environments with Reinforcement Learning
Peng Yu
Zeyuan Zhao
Shao Zhang
Luoyi Fu
Xinbing Wang
Ying Wen
OffRL
LRM
128
0
0
27 Sep 2025
Learning How to Use Tools, Not Just When: Pattern-Aware Tool-Integrated Reasoning
Ningning Xu
Yuxuan Jiang
Shubhashis Roy Dipta
LRM
108
0
0
27 Sep 2025
Cognition-of-Thought Elicits Social-Aligned Reasoning in Large Language Models
Xuanming Zhang
Yuxuan Chen
Min-Hsuan Yeh
Yixuan Li
LRM
196
2
0
27 Sep 2025
Exploring Solution Divergence and Its Effect on Large Language Model Problem Solving
Hang Li
Kaiqi Yang
Yucheng Chu
Hui Liu
Shucheng Zhou
MoMe
LRM
117
0
0
26 Sep 2025
Variational Reasoning for Language Models
Xiangxin Zhou
Zichen Liu
Haonan Wang
Chao Du
Min Lin
Chongxuan Li
Liang Wang
Tianyu Pang
OffRL
LRM
193
0
0
26 Sep 2025
Language Models Can Learn from Verbal Feedback Without Scalar Rewards
Renjie Luo
Zichen Liu
Xiangyan Liu
Chao Du
Min Lin
Wenhu Chen
Wei Lu
Tianyu Pang
OffRL
132
2
0
26 Sep 2025
Solving the Granularity Mismatch: Hierarchical Preference Learning for Long-Horizon LLM Agents
Heyang Gao
Guoqing Liu
Erxue Min
Hengyi Cai
Shuaiqiang Wang
Dawei Yin
Xu Chen
128
0
0
26 Sep 2025
Dynamic Experts Search: Enhancing Reasoning in Mixture-of-Experts LLMs at Test Time
Yixuan Han
Fan Ma
Ruijie Quan
Yi Yang
MoE
LRM
80
0
0
26 Sep 2025
The Rogue Scalpel: Activation Steering Compromises LLM Safety
Anton Korznikov
Andrey V. Galichin
Alexey Dontsov
Oleg Y. Rogov
Ivan Oseledets
Elena Tutubalina
LLMSV
AAML
132
0
0
26 Sep 2025
R-Capsule: Compressing High-Level Plans for Efficient Large Language Model Reasoning
Hongyu Shan
Mingyang Song
Chang Dai
Di Liang
Han Chen
LRM
113
1
0
26 Sep 2025
From Long to Lean: Performance-aware and Adaptive Chain-of-Thought Compression via Multi-round Refinement
Jianzhi Yan
Le Liu
Youcheng Pan
Shiwei Chen
Zike Yuan
Yang Xiang
Buzhou Tang
MQ
LRM
73
0
0
26 Sep 2025
CE-GPPO: Coordinating Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning
Zhenpeng Su
Leiyu Pan
Minxuan Lv
Yuntao Li
Wenping Hu
Fuzheng Zhang
Kun Gai
Guorui Zhou
152
5
0
25 Sep 2025
GALAX: Graph-Augmented Language Model for Explainable Reinforcement-Guided Subgraph Reasoning in Precision Medicine
Heming Zhang
Di Huang
Wenyu Li
Michael Province
Yihao Chen
Philip R. O. Payne
Fuhai Li
LRM
93
0
0
25 Sep 2025
Reinforcement Learning Fine-Tuning Enhances Activation Intensity and Diversity in the Internal Circuitry of LLMs
Honglin Zhang
Qianyue Hao
Fengli Xu
Yong Li
108
0
0
25 Sep 2025
GRPO is Secretly a Process Reward Model
Michael Sullivan
122
0
0
25 Sep 2025
WeFT: Weighted Entropy-driven Fine-Tuning for dLLMs
Guowei Xu
Wenxin Xu
Jiawang Zhao
Kaisheng Ma
DiffM
88
0
0
25 Sep 2025
Tree Search for LLM Agent Reinforcement Learning
Yuxiang Ji
Ziyu Ma
Yong Wang
Guanhua Chen
Xiangxiang Chu
Liaoni Wu
144
3
0
25 Sep 2025
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources
Sicong Leng
Jing Wang
Jiaxi Li
Hao Zhang
Zhiqiang Hu
...
Deli Zhao
Wei Lu
Yu Rong
Aixin Sun
Shijian Lu
OffRL
LRM
97
8
0
25 Sep 2025
Bounds of Chain-of-Thought Robustness: Reasoning Steps, Embed Norms, and Beyond
Dingzirui Wang
Xuanliang Zhang
Keyan Xu
Qingfu Zhu
Wanxiang Che
Yang Deng
LRM
154
0
0
25 Sep 2025
d2: Improved Techniques for Training Reasoning Diffusion Language Models
Guanghan Wang
Yair Schiff
Gilad Turok
Volodymyr Kuleshov
DiffM
OffRL
LRM
156
4
0
25 Sep 2025
Distilling Many-Shot In-Context Learning into a Cheat Sheet
Ukyo Honda
Soichiro Murakami
Peinan Zhang
92
1
0
25 Sep 2025
Thinking Augmented Pre-training
Liang Wang
Nan Yang
Shaohan Huang
Li Dong
Furu Wei
LRM
269
0
0
24 Sep 2025
Are We Scaling the Right Thing? A System Perspective on Test-Time Scaling
Youpeng Zhao
Jinpeng LV
Di Wu
Jun Wang
Christopher Gooley
LRM
92
0
0
23 Sep 2025
HyperAdapt: Simple High-Rank Adaptation
Abel Gurung
Joseph Campbell
143
0
0
23 Sep 2025
Unveiling Chain of Step Reasoning for Vision-Language Models with Fine-grained Rewards
Honghao Chen
Xingzhou Lou
Xiaokun Feng
Kaiqi Huang
Xinlong Wang
OffRL
LRM
169
1
0
23 Sep 2025
Variation in Verification: Understanding Verification Dynamics in Large Language Models
Yefan Zhou
Austin Xu
Yilun Zhou
Janvijay Singh
Jiang Gui
Shafiq Joty
LRM
164
2
0
22 Sep 2025
Exploiting Tree Structure for Credit Assignment in RL Training of LLMs
Hieu Tran
Zonghai Yao
Hong-ye Yu
OffRL
161
1
0
22 Sep 2025
Adaptive Overclocking: Dynamic Control of Thinking Path Length via Real-Time Reasoning Signals
Shuhao Jiang
Songbo Wang
Yang Qiao
Chun Xu
Chaoyang Zheng
Shengyi Zhou
Huanjun Wang
Fangming Li
Cong Zhang
Jiyu Wang
LRM
81
0
0
21 Sep 2025
Towards Transparent and Incentive-Compatible Collaboration in Decentralized LLM Multi-Agent Systems: A Blockchain-Driven Approach
Minfeng Qi
Tianqing Zhu
Lefeng Zhang
Ningran Li
Wanlei Zhou
189
3
0
20 Sep 2025
From Scores to Steps: Diagnosing and Improving LLM Performance in Evidence-Based Medical Calculations
Benlu Wang
Iris Xia
Yifan Zhang
Junda Wang
Feiyun Ouyang
Shuo Han
Arman Cohan
Hong-ye Yu
Zonghai Yao
ELM
128
2
0
20 Sep 2025
SCAN: Self-Denoising Monte Carlo Annotation for Robust Process Reward Learning
Yuyang Ding
Xinyu Shi
Juntao Li
Xiaobo Liang
Zhaopeng Tu
Min Zhang
SyDa
174
2
0
20 Sep 2025
GPO: Learning from Critical Steps to Improve LLM Reasoning
Jiahao Yu
Zelei Cheng
Xian Wu
Xinyu Xing
LRM
167
2
0
19 Sep 2025
Rethinking Molecule Synthesizability with Chain-of-Reaction
Seul Lee
Karsten Kreis
Srimukh Prasad Veccham
Meng Liu
Danny Reidenbach
Saee Paliwal
Weili Nie
Arash Vahdat
LRM
AI4CE
102
4
0
19 Sep 2025
Best-of-L: Cross-Lingual Reward Modeling for Mathematical Reasoning
Sara Rajaee
Rochelle Choenni
Ekaterina Shutova
Christof Monz
LRM
64
0
0
19 Sep 2025
Reward Hacking Mitigation using Verifiable Composite Rewards
Mirza Farhan Bin Tarek
Rahmatollah Beheshti
OffRL
LRM
100
2
0
19 Sep 2025
Previous
1
2
3
4
5
6
...
26
27
28
Next