Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.20050
Cited By
Let's Verify Step by Step
31 May 2023
Hunter Lightman
V. Kosaraju
Yura Burda
Harrison Edwards
Bowen Baker
Teddy Lee
Jan Leike
John Schulman
Ilya Sutskever
K. Cobbe
ALM
OffRL
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Let's Verify Step by Step"
50 / 188 papers shown
Title
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models
Wenxuan Huang
Bohan Jia
Zijie Zhai
Shaosheng Cao
Zheyu Ye
Fei Zhao
Zhe Xu
Yao Hu
Shaohui Lin
MU
OffRL
LRM
MLLM
ReLM
VLM
55
40
0
09 Mar 2025
InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models
Yuchen Yan
Yongliang Shen
Y. Liu
Jin Jiang
M. Zhang
Jian Shao
Yueting Zhuang
LRM
ReLM
53
3
0
09 Mar 2025
Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models?
Kun Xiang
Zhili Liu
Zihao Jiang
Yunshuang Nie
Kaixin Cai
...
Yu-Jie Yuan
J. Han
Lanqing Hong
Hang Xu
Xiaodan Liang
ReLM
LRM
54
6
0
08 Mar 2025
DiffPO: Diffusion-styled Preference Optimization for Efficient Inference-Time Alignment of Large Language Models
Ruizhe Chen
Wenhao Chai
Zhifei Yang
Xiaotian Zhang
Joey Tianyi Zhou
Tony Q. S. Quek
Soujanya Poria
Zuozhu Liu
48
0
0
06 Mar 2025
Better Process Supervision with Bi-directional Rewarding Signals
Wenxiang Chen
Wei He
Zhiheng Xi
Honglin Guo
Boyang Hong
...
Nijun Li
Tao Gui
Yun Li
Qi Zhang
Xuanjing Huang
LRM
48
2
0
06 Mar 2025
DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models
Yi Shen
J. Zhang
Jieyun Huang
Shuming Shi
Wenjing Zhang
Jiangze Yan
Ning Wang
Kai Wang
Shiguo Lian
LRM
78
14
0
06 Mar 2025
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
Abdelrahman Abouelenin
Atabak Ashfaq
Adam Atkinson
Hany Awadalla
Nguyen Bach
...
Ishmam Zabir
Yunan Zhang
Li Zhang
Y. Zhang
Xiren Zhou
MoE
SyDa
68
23
0
03 Mar 2025
How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach
Ayeong Lee
Ethan Che
Tianyi Peng
LRM
42
10
0
03 Mar 2025
Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable
Tiansheng Huang
Sihao Hu
Fatih Ilhan
Selim Furkan Tekin
Zachary Yahn
Yichang Xu
Ling Liu
53
8
0
01 Mar 2025
Self-Training Elicits Concise Reasoning in Large Language Models
Tergel Munkhbat
Namgyu Ho
S. Kim
Yongjin Yang
Yujin Kim
Se-Young Yun
ReLM
LRM
54
10
0
27 Feb 2025
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?
Yancheng He
Shilong Li
J. Liu
Weixun Wang
Xingyuan Bu
...
Zhongyuan Peng
Z. Zhang
Zhicheng Zheng
Wenbo Su
Bo Zheng
ELM
LRM
79
7
0
26 Feb 2025
How Far are LLMs from Real Search? A Comprehensive Study on Efficiency, Completeness, and Inherent Capabilities
M. Lin
Hui Liu
X. Tang
Jingying Zeng
Zhenwei Dai
Chen Luo
Zheng Li
Xiang Zhang
Qi He
Suhang Wang
OffRL
LRM
41
0
0
25 Feb 2025
LeanProgress: Guiding Search for Neural Theorem Proving via Proof Progress Prediction
Suozhi Huang
Peiyang Song
Robert Joseph George
Anima Anandkumar
AI4TS
LRM
42
2
0
25 Feb 2025
Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling
Yiwen Ding
Zhiheng Xi
Wei He
Zhuoyuan Li
Yitao Zhai
Xiaowei Shi
Xunliang Cai
Tao Gui
Qi Zhang
Xuanjing Huang
LRM
69
3
0
24 Feb 2025
NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions
Weizhe Yuan
Jane Dwivedi-Yu
Song Jiang
Karthik Padthe
Yang Li
...
Ilia Kulikov
Kyunghyun Cho
Yuandong Tian
Jason Weston
Xian Li
ReLM
LRM
51
10
0
24 Feb 2025
DISC: Dynamic Decomposition Improves LLM Inference Scaling
Jonathan Light
Wei Cheng
Wu Yue
Masafumi Oyamada
Mengdi Wang
Santiago Paternain
Haifeng Chen
ReLM
LRM
56
2
0
23 Feb 2025
Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents
Patrick Tser Jern Kon
Jiachen Liu
Qiuyi Ding
Yiming Qiu
Zhenning Yang
Yibo Huang
Jayanth Srinivasa
Myungjin Lee
Mosharaf Chowdhury
Ang Chen
51
3
0
22 Feb 2025
Dynamic Parallel Tree Search for Efficient LLM Reasoning
Yifu Ding
Wentao Jiang
Shunyu Liu
Yongcheng Jing
J. Guo
...
Zengmao Wang
Z. Liu
Bo Du
X. Liu
Dacheng Tao
LRM
44
4
0
22 Feb 2025
Synthesizing Post-Training Data for LLMs through Multi-Agent Simulation
Shuo Tang
Xianghe Pang
Zexi Liu
Bohan Tang
Rui Ye
Xiaowen Dong
Y. Wang
Yanfeng Wang
S. Chen
SyDa
LLMAG
127
3
0
21 Feb 2025
A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics
Ting-Ruen Wei
Haowei Liu
Xuyang Wu
Yi Fang
LRM
AI4CE
ReLM
KELM
155
1
0
21 Feb 2025
Prompt-based Depth Pruning of Large Language Models
Juyun Wee
Minjae Park
Jaeho Lee
VLM
86
0
0
17 Feb 2025
A Critical Look At Tokenwise Reward-Guided Text Generation
Ahmad Rashid
Ruotian Wu
Julia Grosse
Agustinus Kristiadi
Pascal Poupart
OffRL
68
0
0
17 Feb 2025
AURORA:Automated Training Framework of Universal Process Reward Models via Ensemble Prompting and Reverse Verification
Xiaoyu Tan
Tianchu Yao
C. Qu
Bin Li
Minghao Yang
...
Haozhe Wang
Xihe Qiu
Wei Chu
Yinghui Xu
Yuan Qi
OffRL
LRM
44
2
0
17 Feb 2025
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
Guangzhi Sun
Yudong Yang
Jimin Zhuang
Changli Tang
Y. Li
W. Li
Z. Ma
Chao Zhang
LRM
MLLM
VLM
64
3
0
17 Feb 2025
Preference Optimization for Reasoning with Pseudo Feedback
Fangkai Jiao
Geyang Guo
Xingxing Zhang
Nancy F. Chen
Shafiq R. Joty
Furu Wei
LRM
99
9
0
17 Feb 2025
Uncertainty-Aware Step-wise Verification with Generative Reward Models
Zihuiwen Ye
L. Melo
Younesse Kaddar
Phil Blunsom
S. Kamath S
Yarin Gal
LRM
44
0
0
16 Feb 2025
BoT: Breaking Long Thought Processes of o1-like Large Language Models through Backdoor Attack
Zihao Zhu
Hongbao Zhang
Mingda Zhang
Ruotong Wang
Guanzong Wu
Ke Xu
Baoyuan Wu
AAML
LRM
54
4
0
16 Feb 2025
Leveraging Constrained Monte Carlo Tree Search to Generate Reliable Long Chain-of-Thought for Mathematical Reasoning
Qingwen Lin
Boyan Xu
Zijian Li
Z. Hao
Keli Zhang
Ruichu Cai
LRM
41
2
0
16 Feb 2025
MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations
Kaixuan Huang
Jiacheng Guo
Zihao Li
X. Ji
Jiawei Ge
...
Yangsibo Huang
Chi Jin
Xinyun Chen
Chiyuan Zhang
Mengdi Wang
AAML
LRM
91
7
0
10 Feb 2025
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates
L. Yang
Zhaochen Yu
Bin Cui
Mengdi Wang
ReLM
LRM
AI4CE
96
10
0
10 Feb 2025
Rationalization Models for Text-to-SQL
Gaetano Rossiello
Nhan Pham
Michael R. Glass
Junkyu Lee
Shankar Subramanian
ReLM
LRM
50
0
0
10 Feb 2025
Unbiased Evaluation of Large Language Models from a Causal Perspective
Meilin Chen
Jian Tian
Liang Ma
Di Xie
Weijie Chen
Jiang Zhu
ALM
ELM
52
0
0
10 Feb 2025
Iterative Deepening Sampling for Large Language Models
Weizhe Chen
Sven Koenig
B. Dilkina
LRM
ReLM
88
0
0
08 Feb 2025
Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization
Yuanye Liu
Jiahang Xu
Li Lyna Zhang
Qi Chen
Xuan Feng
Yang Chen
Zhongxin Guo
Yuqing Yang
Cheng Peng
82
2
0
06 Feb 2025
The Cake that is Intelligence and Who Gets to Bake it: An AI Analogy and its Implications for Participation
Martin Mundt
Anaelia Ovalle
Felix Friedrich
A Pranav
Subarnaduti Paul
Manuel Brack
Kristian Kersting
William Agnew
240
0
0
05 Feb 2025
Policy Guided Tree Search for Enhanced LLM Reasoning
Yang Li
LRM
53
0
0
04 Feb 2025
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Jinyang Wu
Mingkuan Feng
Shuai Zhang
Ruihan Jin
Feihu Che
Zengqi Wen
J. Tao
LRM
68
8
0
04 Feb 2025
A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods
Isha Puri
Shivchander Sudalairaj
Guangxuan Xu
Kai Xu
Akash Srivastava
LRM
74
3
0
03 Feb 2025
Learning to Generate Unit Tests for Automated Debugging
Archiki Prasad
Elias Stengel-Eskin
Justin Chih-Yao Chen
Zaid Khan
Mohit Bansal
ELM
81
1
0
03 Feb 2025
Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges
Nayoung Lee
Ziyang Cai
Avi Schwarzschild
Kangwook Lee
Dimitris Papailiopoulos
ReLM
VLM
LRM
AI4CE
73
4
0
03 Feb 2025
Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization
Zishun Yu
Tengyu Xu
Di Jin
Karthik Abinav Sankararaman
Yun He
...
Eryk Helenowski
Chen Zhu
Sinong Wang
Hao Ma
Han Fang
LRM
51
4
0
29 Jan 2025
COS(M+O)S: Curiosity and RL-Enhanced MCTS for Exploring Story Space via Language Models
Tobias Materzok
LRM
67
0
0
28 Jan 2025
Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation
Satyapriya Krishna
Kalpesh Krishna
Anhad Mohananey
Steven Schwarcz
Adam Stambler
Shyam Upadhyay
Manaal Faruqui
ReLM
3DV
LRM
RALM
35
13
0
28 Jan 2025
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback
Yafu Li
Xuyang Hu
Xiaoye Qu
Linjie Li
Yu-Xi Cheng
51
3
0
22 Jan 2025
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Kimi Team
Angang Du
Bofei Gao
Bowei Xing
Changjiu Jiang
...
Zhilin Yang
Zhiqi Huang
Zihao Huang
Ziyao Xu
Z. Yang
VLM
ALM
OffRL
AI4TS
LRM
106
135
0
22 Jan 2025
MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking
Sebastian Farquhar
Vikrant Varma
David Lindner
David Elson
Caleb Biddulph
Ian Goodfellow
Rohin Shah
82
1
0
22 Jan 2025
Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling
Zhenyu Hou
Xin Lv
Rui Lu
J. Zhang
Y. Li
Zijun Yao
Juanzi Li
J. Tang
Yuxiao Dong
OffRL
LRM
ReLM
52
20
0
20 Jan 2025
Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models
Junyu Chen
Han Cai
Junsong Chen
E. Xie
Shang Yang
Haotian Tang
Muyang Li
Y. Lu
Song Han
DiffM
61
36
0
20 Jan 2025
Planning-Driven Programming: A Large Language Model Programming Workflow
Chao Lei
Yanchuan Chang
N. Lipovetzky
Krista A. Ehinger
81
2
0
10 Jan 2025
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics
Ruilin Luo
Zhuofan Zheng
Yifan Wang
Yiyao Yu
Xinzhe Ni
Zicheng Lin
Jin Zeng
Yujiu Yang
LRM
70
13
0
08 Jan 2025
Previous
1
2
3
4
Next