Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2305.20050
Cited By
Let's Verify Step by Step
International Conference on Learning Representations (ICLR), 2023
31 May 2023
Hunter Lightman
V. Kosaraju
Yura Burda
Harrison Edwards
Bowen Baker
Teddy Lee
Jan Leike
John Schulman
Ilya Sutskever
K. Cobbe
ALM
OffRL
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (10 upvotes)
Papers citing
"Let's Verify Step by Step"
50 / 1,441 papers shown
Token-level Proximal Policy Optimization for Query Generation
Yichen Ouyang
Lu Wang
Fangkai Yang
Lu Wang
Chenghua Huang
...
Saravan Rajmohan
Weiwei Deng
Dongmei Zhang
Feng Sun
Qi Zhang
OffRL
897
7
0
01 Nov 2024
RL-STaR: Theoretical Analysis of Reinforcement Learning Frameworks for Self-Taught Reasoner
Fu-Chieh Chang
Yu-Ting Lee
Hui-Ying Shih
Pei-Yuan Wu
Pei-Yuan Wu
OffRL
LRM
993
1
0
31 Oct 2024
Smaller Large Language Models Can Do Moral Self-Correction
Guangliang Liu
Zhiyu Xue
Rongrong Wang
K. Johnson
Kristen Marie Johnson
LRM
357
2
0
30 Oct 2024
Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning
Yihe Deng
Paul Mineiro
LRM
213
9
0
29 Oct 2024
AutoGLM: Autonomous Foundation Agents for GUIs
Xiao Liu
Bo Qin
Dongzhu Liang
Guang Dong
Hanyu Lai
...
Yujia Wang
Yongjun Xu
Zehan Qi
Yuxiao Dong
Jie Tang
LLMAG
313
51
0
28 Oct 2024
HoPE: A Novel Positional Encoding Without Long-Term Decay for Enhanced Context Awareness and Extrapolation
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Yuhan Chen
Ang Lv
Jian Luan
Bin Wang
Wen Liu
227
10
0
28 Oct 2024
Matryoshka Pilot: Learning to Drive Black-Box LLMs with LLMs
Changhao Li
Yuchen Zhuang
Rushi Qiang
Haotian Sun
H. Dai
Chao Zhang
Bo Dai
LRM
325
6
0
28 Oct 2024
Guiding Through Complexity: What Makes Good Supervision for Hard Math Reasoning Tasks?
Xuan He
Da Yin
Nanyun Peng
LRM
265
0
0
27 Oct 2024
GFlowNet Fine-tuning for Diverse Correct Solutions in Mathematical Reasoning Tasks
Ryoichi Takase
Masaya Tsunokake
Yuta Tsuchiya
Shota Inuzuka
LRM
193
6
0
26 Oct 2024
GHIL-Glue: Hierarchical Control with Filtered Subgoal Images
IEEE International Conference on Robotics and Automation (ICRA), 2024
Kyle Hatch
Ashwin Balakrishna
Oier Mees
Suraj Nair
Seohong Park
...
Masha Itkina
Benjamin Eysenbach
Sergey Levine
Thomas Kollar
Benjamin Burchfiel
312
8
0
26 Oct 2024
Cooperative Strategic Planning Enhances Reasoning Capabilities in Large Language Models
Danqing Wang
Zhuorui Ye
Fei Fang
Lei Li
LLMAG
LRM
200
4
0
25 Oct 2024
2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision
Shilong Li
Yancheng He
Hui Huang
Xingyuan Bu
Qingbin Liu
Hangyu Guo
Weixun Wang
Jihao Gu
Yuchi Xu
Bo Zheng
227
9
0
25 Oct 2024
Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks
Graziano A. Manduzio
Federico A. Galatolo
M. G. Cimino
Enzo Pasquale Scilingo
Lorenzo Cominelli
LRM
192
9
0
24 Oct 2024
Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Lester James V. Miranda
Yizhong Wang
Yanai Elazar
Sachin Kumar
Valentina Pyatkin
Faeze Brahman
Noah A. Smith
Hannaneh Hajishirzi
Pradeep Dasigi
457
20
0
24 Oct 2024
Process Supervision-Guided Policy Optimization for Code Generation
Ning Dai
Zheng Wu
Renjie Zheng
Ziyun Wei
Wenlei Shi
Xing Jin
Guanlin Liu
Chen Dun
Liang Huang
Lin Yan
263
19
0
23 Oct 2024
RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style
International Conference on Learning Representations (ICLR), 2024
Yantao Liu
Zijun Yao
Rui Min
Yixin Cao
Lei Hou
Juanzi Li
OffRL
ALM
345
100
0
21 Oct 2024
On Designing Effective RL Reward at Training Time for LLM Reasoning
Jiaxuan Gao
Shusheng Xu
Wenjie Ye
Weilin Liu
Chuyi He
Wei Fu
Zhiyu Mei
Guangju Wang
Yi Wu
OffRL
LRM
549
55
0
19 Oct 2024
Step Guided Reasoning: Improving Mathematical Reasoning using Guidance Generation and Step Reasoning
Lang Cao
Chao Peng
Renhong Chen
Wu Ning
Yingtian Zou
Yitong Li
LRM
373
2
0
18 Oct 2024
Think Thrice Before You Act: Progressive Thought Refinement in Large Language Models
Chengyu Du
Jinyi Han
Yizhou Ying
Aili Chen
Qianyu He
...
Haoran Guo
Jiaqing Liang
Zulong Chen
Liangyue Li
Yanghua Xiao
KELM
CLL
LRM
245
5
0
17 Oct 2024
MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Zonghai Yao
Aditya Parashar
Huixue Zhou
Won Seok Jang
Feiyun Ouyang
Zhichao Yang
Hong-ye Yu
ELM
429
16
0
17 Oct 2024
Enhancing Mathematical Reasoning in LLMs by Stepwise Correction
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Zhenyu Wu
Qingkai Zeng
Zizhuo Zhang
Zhaoxuan Tan
Chao Shen
Meng Jiang
KELM
LRM
264
8
0
16 Oct 2024
A Survey on Data Synthesis and Augmentation for Large Language Models
Ke Wang
Jiahui Zhu
Minjie Ren
Ziqiang Liu
Shiwei Li
...
Yiming Lei
Xiaoyu Wu
Qiqi Zhan
Qingjie Liu
Yunhong Wang
SyDa
425
36
0
16 Oct 2024
Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math Reasoning
Vernon Y.H. Toh
Deepanway Ghosal
Soujanya Poria
LRM
182
7
0
16 Oct 2024
MSc-SQL: Multi-Sample Critiquing Small Language Models For Text-To-SQL Translation
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
S. Gorti
Ilan Gofman
Zhaoyan Liu
Jiapeng Wu
Noël Vouitsis
Guangwei Yu
Jesse C. Cresswell
Rasa Hosseinzadeh
SyDa
427
21
0
16 Oct 2024
JudgeBench: A Benchmark for Evaluating LLM-based Judges
International Conference on Learning Representations (ICLR), 2024
Sijun Tan
Siyuan Zhuang
Kyle Montgomery
William Y. Tang
Alejandro Cuadron
Chenguang Wang
Raluca A. Popa
Ion Stoica
ELM
ALM
722
149
0
16 Oct 2024
Process Reward Model with Q-Value Rankings
International Conference on Learning Representations (ICLR), 2024
W. Li
Yixuan Li
LRM
631
61
0
15 Oct 2024
Agent-as-a-Judge: Evaluate Agents with Agents
Mingchen Zhuge
Changsheng Zhao
Dylan R. Ashley
Wenyi Wang
Dmitrii Khizbullin
...
Raghuraman Krishnamoorthi
Yuandong Tian
Yangyang Shi
Vikas Chandra
Jürgen Schmidhuber
ELM
411
112
0
14 Oct 2024
Overcoming classic challenges for artificial neural networks by providing incentives and practice
Nature Machine Intelligence (Nat. Mach. Intell.), 2024
Kazuki Irie
Brenden M. Lake
585
8
0
14 Oct 2024
Innovative Thinking, Infinite Humor: Humor Research of Large Language Models through Structured Thought Leaps
International Conference on Learning Representations (ICLR), 2024
Han Wang
Yilin Zhao
Dian Li
Xiaohan Wang
Gang Liu
Xuguang Lan
Jian Shu
LRM
459
3
0
14 Oct 2024
CoMAT: Chain of Mathematically Annotated Thought Improves Mathematical Reasoning
Joshua Ong Jun Leang
Aryo Pradipta Gema
Shay B. Cohen
ReLM
LRM
ReCod
412
10
0
14 Oct 2024
Language Model Embeddings Can Be Sufficient for Bayesian Optimization
Tung Nguyen
Qiuyi Zhang
Bangding Yang
Chansoo Lee
J. Bornschein
Yingjie Miao
Sagi Perel
Yutian Chen
Xingyou Song
BDL
366
11
0
14 Oct 2024
COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement
Yuxi Xie
Anirudh Goyal
Xiaobao Wu
Xunjian Yin
Xiao Xu
Min-Yen Kan
Liangming Pan
William Yang Wang
LRM
901
1
0
12 Oct 2024
OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models
Jun Wang
Meng Fang
Bo Liu
Muning Wen
Jiachen Zhu
...
Lei Chen
Lionel M. Ni
Linyi Yang
Ying Wen
Weinan Zhang
LRM
232
61
0
12 Oct 2024
Boosting Deductive Reasoning with Step Signals In RLHF
Jiajun Li
Yipin Zhang
Wei Shen
Yuzi Yan
Jian Xie
Dong Yan
LRM
ReLM
156
2
0
12 Oct 2024
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization
Guanlin Liu
Kaixuan Ji
Ning Dai
Zheng Wu
Chen Dun
Q. Gu
Lin Yan
Quanquan Gu
Lin Yan
OffRL
LRM
376
18
0
11 Oct 2024
Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
International Conference on Learning Representations (ICLR), 2024
Amrith Rajagopal Setlur
Chirag Nagpal
Adam Fisch
Xinyang Geng
Jacob Eisenstein
Rishabh Agarwal
Alekh Agarwal
Jonathan Berant
Aviral Kumar
OffRL
LRM
401
166
0
10 Oct 2024
Efficient Reinforcement Learning with Large Language Model Priors
Xue Yan
Yan Song
Xidong Feng
Mengyue Yang
Haifeng Zhang
Haitham Bou Ammar
Jun Wang
OffRL
213
20
0
10 Oct 2024
Automatic Curriculum Expert Iteration for Reliable LLM Reasoning
International Conference on Learning Representations (ICLR), 2024
Zirui Zhao
Hanze Dong
Amrita Saha
Caiming Xiong
Doyen Sahoo
LRM
364
13
0
10 Oct 2024
Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning
Xiyao Wang
Linfeng Song
Ye Tian
Dian Yu
Baolin Peng
Haitao Mi
Furong Huang
Dong Yu
LRM
303
22
0
09 Oct 2024
Subtle Errors in Reasoning: Preference Learning via Error-injected Self-editing
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Kaishuai Xu
Tiezheng YU
Wenjun Hou
Yi Cheng
Chak Tou Leong
Liangyou Li
Xin Jiang
Lifeng Shang
Qun Liu
Wenjie Li
LRM
1.0K
0
0
09 Oct 2024
O1 Replication Journey: A Strategic Progress Report -- Part 1
Yiwei Qin
Xuefeng Li
Haoyang Zou
Yixiu Liu
Shijie Xia
...
Yixin Ye
Weizhe Yuan
Hector Liu
Rui Wang
Pengfei Liu
VLM
364
137
0
08 Oct 2024
On the Modeling Capabilities of Large Language Models for Sequential Decision Making
International Conference on Learning Representations (ICLR), 2024
Martin Klissarov
Devon Hjelm
Alexander Toshev
Bogdan Mazoure
LM&Ro
ELM
OffRL
LRM
309
7
0
08 Oct 2024
FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning
Ruosen Li
Ziming Luo
Xinya Du
LRM
256
8
0
08 Oct 2024
Rationale-Aware Answer Verification by Pairwise Self-Evaluation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Akira Kawabata
Saku Sugawara
LRM
346
7
0
07 Oct 2024
Reasoning Paths Optimization: Learning to Reason and Explore From Diverse Paths
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Yew Ken Chia
Guizhen Chen
Weiwen Xu
Luu Anh Tuan
Soujanya Poria
Lidong Bing
LRM
241
4
0
07 Oct 2024
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency
International Conference on Learning Representations (ICLR), 2024
Kaiyue Wen
Huaqing Zhang
Hongzhou Lin
Jingzhao Zhang
MoE
LRM
573
14
0
07 Oct 2024
Active Fine-Tuning of Multi-Task Policies
Marco Bagatella
Jonas Hübotter
Georg Martius
Andreas Krause
566
0
0
07 Oct 2024
Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification
Zhenwen Liang
Ye Liu
Tong Niu
Xiangliang Zhang
Yingbo Zhou
Semih Yavuz
LRM
267
35
0
05 Oct 2024
Misinformation with Legal Consequences (MisLC): A New Task Towards Harnessing Societal Harm of Misinformation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Chu Fei Luo
Radin Shayanfar
R. Bhambhoria
Samuel Dahan
Xiaodan Zhu
AILaw
215
2
0
04 Oct 2024
System 2 Reasoning Capabilities Are Nigh
Scott C. Lowe
VLM
LRM
203
2
0
04 Oct 2024
Previous
1
2
3
...
22
23
24
...
27
28
29
Next
Page 23 of 29
Page
of 29
Go