Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2305.20050
Cited By
Let's Verify Step by Step
International Conference on Learning Representations (ICLR), 2023
31 May 2023
Hunter Lightman
V. Kosaraju
Yura Burda
Harrison Edwards
Bowen Baker
Teddy Lee
Jan Leike
John Schulman
Ilya Sutskever
K. Cobbe
ALM
OffRL
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (10 upvotes)
Papers citing
"Let's Verify Step by Step"
50 / 1,441 papers shown
Multi-Task Inference: Can Large Language Models Follow Multiple Instructions at Once?
Seunghyeok Hong
Sangwon Baek
Sangdae Nam
Guijin Son
Seungone Kim
ELM
LRM
429
25
0
18 Feb 2024
EventRL: Enhancing Event Extraction with Outcome Supervision for Large Language Models
Jun Gao
Huan Zhao
Wei Wang
Changlong Yu
Ruifeng Xu
OffRL
176
8
0
18 Feb 2024
I Learn Better If You Speak My Language: Understanding the Superior Performance of Fine-Tuning Large Language Models with LLM-Generated Responses
Xuan Ren
Biao Wu
Lingqiao Liu
276
13
0
17 Feb 2024
Reward Generalization in RLHF: A Topological Perspective
Tianyi Qiu
Fanzhi Zeng
Jiaming Ji
Dong Yan
Kaile Wang
Jiayi Zhou
Yang Han
Josef Dai
Xuehai Pan
Yaodong Yang
AI4CE
370
7
0
15 Feb 2024
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
Shubham Toshniwal
Ivan Moshkov
Mehrzad Samadi
Daria Gitman
Fei Jia
Igor Gitman
247
140
0
15 Feb 2024
MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data
Yinya Huang
Xiaohan Lin
Zhengying Liu
Qingxing Cao
Huajian Xin
Haiming Wang
Zhenguo Li
Linqi Song
Xiaodan Liang
ALM
373
45
0
14 Feb 2024
GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements
Alex Havrilla
Sharath Raparthy
Christoforus Nalmpantis
Jane Dwivedi-Yu
Maksym Zhuravinskyi
Eric Hambro
Roberta Railneau
ReLM
LRM
245
95
0
13 Feb 2024
Suppressing Pink Elephants with Direct Principle Feedback
Louis Castricato
Nathan Lile
Suraj Anand
Hailey Schoelkopf
Siddharth Verma
Stella Biderman
273
13
0
12 Feb 2024
V-STaR: Training Verifiers for Self-Taught Reasoners
Arian Hosseini
Xingdi Yuan
Nikolay Malkin
Rameswar Panda
Alessandro Sordoni
Rishabh Agarwal
ReLM
LRM
321
192
0
09 Feb 2024
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning
Huaiyuan Ying
Shuo Zhang
Linyang Li
Zhejian Zhou
Yunfan Shao
...
Hang Yan
Xipeng Qiu
Jiayu Wang
Kai-xiang Chen
Dahua Lin
ReLM
LRM
229
113
0
09 Feb 2024
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning
Zhiheng Xi
Wenxiang Chen
Boyang Hong
Senjie Jin
Rui Zheng
...
Xinbo Zhang
Yang Liu
Tao Gui
Tao Gui
Xuanjing Huang
LRM
206
53
0
08 Feb 2024
FaithLM: Towards Faithful Explanations for Large Language Models
Yu-Neng Chuang
Guanchu Wang
Chia-Yuan Chang
Ruixiang Tang
Shaochen Zhong
Fan Yang
Mengnan Du
Xuanting Cai
Helen Zhou
Xia Hu
LRM
312
4
0
07 Feb 2024
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao
Peiyi Wang
Qihao Zhu
Runxin Xu
Jun-Mei Song
...
Haowei Zhang
Mingchuan Zhang
Yiming Li
Yu-Huan Wu
Daya Guo
ReLM
LRM
1.5K
3,768
0
05 Feb 2024
Unified Hallucination Detection for Multimodal Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Xiang Chen
Chenxi Wang
Yida Xue
Ningyu Zhang
Xiaoyan Yang
Qian Li
Yue Shen
Lei Liang
Jinjie Gu
Huajun Chen
HILM
434
67
0
05 Feb 2024
Empowering Time Series Analysis with Large Language Models: A Survey
International Joint Conference on Artificial Intelligence (IJCAI), 2024
Yushan Jiang
Zijie Pan
Xikun Zhang
Sahil Garg
Anderson Schneider
Yuriy Nevmyvaka
Dongjin Song
AI4TS
AIFin
364
79
0
05 Feb 2024
The Matrix: A Bayesian learning model for LLMs
Siddhartha Dalal
Vishal Misra
119
1
0
05 Feb 2024
Multi-step Problem Solving Through a Verifier: An Empirical Analysis on Model-induced Process Supervision
Zihan Wang
Yunxuan Li
Yuexin Wu
Liangchen Luo
Le Hou
Hongkun Yu
Jingbo Shang
LRM
228
42
0
05 Feb 2024
AMOR: A Recipe for Building Adaptable Modular Knowledge Agents Through Process Feedback
Jian Guan
Wei Wu
Zujie Wen
Peng Xu
Hongning Wang
Shiyu Huang
LRM
194
29
0
02 Feb 2024
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
Jiajun Sun
Yan Liu
Haoxiang Jia
Limao Xiong
Enyu Zhou
...
Changzhi Sun
Rui Zheng
Tao Gui
Xuanjing Huang
Tao Gui
LLMAG
302
75
0
02 Feb 2024
Dense Reward for Free in Reinforcement Learning from Human Feedback
Alex J. Chan
Hao Sun
Samuel Holt
M. Schaar
268
60
0
01 Feb 2024
Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing
Fangkai Jiao
Chengwei Qin
Zhengyuan Liu
Nancy F. Chen
Shafiq Joty
LRM
236
50
0
01 Feb 2024
Large Language Models for Mathematical Reasoning: Progresses and Challenges
Janice Ahn
Rishu Verma
Renze Lou
Di Liu
Rui Zhang
Wenpeng Yin
LRM
360
265
0
31 Jan 2024
EEG-GPT: Exploring Capabilities of Large Language Models for EEG Classification and Interpretation
Jonathan W. Kim
Ahmed Alaa
Danilo Bernardo
218
29
0
31 Jan 2024
Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation
Zhenyu He
Guhao Feng
Shengjie Luo
Kai-Bo Yang
Liwei Wang
Jingjing Xu
Zhi Zhang
Hongxia Yang
Di He
191
23
0
29 Jan 2024
ARGS: Alignment as Reward-Guided Search
International Conference on Learning Representations (ICLR), 2024
Maxim Khanov
Jirayu Burapacheep
Yixuan Li
426
93
0
23 Jan 2024
Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback
International Conference on Machine Learning (ICML), 2024
Songyang Gao
Qiming Ge
Wei Shen
Jiajun Sun
Junjie Ye
...
Yicheng Zou
Zhi Chen
Hang Yan
Tao Gui
Dahua Lin
231
20
0
21 Jan 2024
Augmenting Math Word Problems via Iterative Question Composing
AAAI Conference on Artificial Intelligence (AAAI), 2024
Haoxiong Liu
Yifan Zhang
Yifan Luo
Andrew Chi-Chih Yao
SyDa
LRM
534
64
0
17 Jan 2024
ReFT: Reasoning with Reinforced Fine-Tuning
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Trung Quoc Luong
Xinbo Zhang
Zhanming Jie
Yang Liu
Xiaoran Jin
Hang Li
OffRL
LRM
ReLM
316
236
0
17 Jan 2024
MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible Pipeline
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Minpeng Liao
Wei Luo
Chengxi Li
Jing Wu
Kai Fan
LRM
298
70
0
16 Jan 2024
Beyond Sparse Rewards: Enhancing Reinforcement Learning with Language Model Critique in Text Generation
Meng Cao
Lei Shu
Lei Yu
Yun Zhu
Nevan Wichers
Yinxiao Liu
Lei Meng
OffRL
ALM
321
15
0
14 Jan 2024
CHAMP: A Competition-level Dataset for Fine-Grained Analyses of LLMs' Mathematical Reasoning Capabilities
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Yujun Mao
Yoon Kim
Yilun Zhou
LRM
ReLM
296
37
0
13 Jan 2024
Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Zhipeng Chen
Kun Zhou
Wayne Xin Zhao
Junchen Wan
Fuzheng Zhang
Chen Zhang
Ji-Rong Wen
KELM
339
41
0
11 Jan 2024
Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Wenqi Zhang
Yongliang Shen
Linjuan Wu
Qiuying Peng
Jun Wang
Yueting Zhuang
Weiming Lu
LRM
LLMAG
511
95
0
04 Jan 2024
Olapa-MCoT: Enhancing the Chinese Mathematical Reasoning Capability of LLMs
Shaojie Zhu
Zhaobin Wang
Chengxiang Zhuo
Hui Lu
Bo Hu
Zang Li
LRM
127
0
0
29 Dec 2023
MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation
Zhongshen Zeng
Pengguang Chen
Shu Liu
Haiyun Jiang
Jiaya Jia
ReLM
ELM
LRM
392
37
0
28 Dec 2023
Alleviating Hallucinations of Large Language Models through Induced Hallucinations
Yue Zhang
Leyang Cui
Wei Bi
Shuming Shi
HILM
299
74
0
25 Dec 2023
Prompt Valuation Based on Shapley Values
Hanxi Liu
Xiaokai Mao
Haocheng Xia
Jian Lou
Jinfei Liu
200
9
0
24 Dec 2023
Reasons to Reject? Aligning Language Models with Judgments
Weiwen Xu
Deng Cai
Zhisong Zhang
Wai Lam
Shuming Shi
ALM
346
17
0
22 Dec 2023
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
International Conference on Machine Learning (ICML), 2023
Collin Burns
Pavel Izmailov
Jan Hendrik Kirchner
Bowen Baker
Leo Gao
...
Adrien Ecoffet
Manas Joglekar
Jan Leike
Ilya Sutskever
Jeff Wu
ELM
344
382
0
14 Dec 2023
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Peiyi Wang
Lei Li
Zhihong Shao
R. X. Xu
Damai Dai
Yifei Li
Deli Chen
Y.Wu
Zhifang Sui
AIMat
LRM
ALM
442
662
0
14 Dec 2023
Alignment for Honesty
Neural Information Processing Systems (NeurIPS), 2023
Yuqing Yang
Ethan Chern
Xipeng Qiu
Graham Neubig
Pengfei Liu
257
58
0
12 Dec 2023
NLLG Quarterly arXiv Report 09/23: What are the most influential current AI Papers?
Ran Zhang
Aida Kostikova
Christoph Leiter
Jonas Belouadi
Daniil Larionov
Yanran Chen
Vivian Fresen
Steffen Eger
182
0
0
09 Dec 2023
Large Knowledge Model: Perspectives and Challenges
Data Intelligence (DI), 2023
Huajun Chen
KELM
352
22
0
05 Dec 2023
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
Computer Vision and Pattern Recognition (CVPR), 2023
M. Steyvers
Yuan Yao
Haoye Zhang
Taiwen He
Yifeng Han
...
Xinyue Hu
Zhiyuan Liu
Hai-Tao Zheng
Maosong Sun
Tat-Seng Chua
MLLM
VLM
420
343
0
01 Dec 2023
LLM-Assisted Code Cleaning For Training Accurate Code Generators
International Conference on Learning Representations (ICLR), 2023
Naman Jain
Tianjun Zhang
Wei-Lin Chiang
Joseph E. Gonzalez
Koushik Sen
Ion Stoica
185
40
0
25 Nov 2023
Positional Description Matters for Transformers Arithmetic
Ruoqi Shen
Sébastien Bubeck
Ronen Eldan
Yin Tat Lee
Yuanzhi Li
Yi Zhang
265
56
0
22 Nov 2023
A Baseline Analysis of Reward Models' Ability To Accurately Analyze Foundation Models Under Distribution Shift
Will LeVine
Benjamin Pikus
Tony Chen
Sean Hendryx
592
16
0
21 Nov 2023
Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents
Zhuosheng Zhang
Yao Yao
Aston Zhang
Xiangru Tang
Xinbei Ma
...
Yiming Wang
Mark B. Gerstein
Rui Wang
Gongshen Liu
Hai Zhao
LLMAG
LM&Ro
LRM
357
91
0
20 Nov 2023
Meta Prompting for AI Systems
Yifan Zhang
Yang Yuan
Andrew Chi-Chih Yao
LLMAG
LRM
737
16
0
20 Nov 2023
OVM, Outcome-supervised Value Models for Planning in Mathematical Reasoning
Fei Yu
Anningzhe Gao
Benyou Wang
OffRL
LRM
228
82
0
16 Nov 2023
Previous
1
2
3
...
26
27
28
29
Next