ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.20050
  4. Cited By
Let's Verify Step by Step

Let's Verify Step by Step

International Conference on Learning Representations (ICLR), 2023
31 May 2023
Hunter Lightman
V. Kosaraju
Yura Burda
Harrison Edwards
Bowen Baker
Teddy Lee
Jan Leike
John Schulman
Ilya Sutskever
K. Cobbe
    ALMOffRLLRM
ArXiv (abs)PDFHTMLHuggingFace (10 upvotes)

Papers citing "Let's Verify Step by Step"

50 / 1,441 papers shown
Multi-Task Inference: Can Large Language Models Follow Multiple
  Instructions at Once?
Multi-Task Inference: Can Large Language Models Follow Multiple Instructions at Once?
Seunghyeok Hong
Sangwon Baek
Sangdae Nam
Guijin Son
Seungone Kim
ELMLRM
429
25
0
18 Feb 2024
EventRL: Enhancing Event Extraction with Outcome Supervision for Large
  Language Models
EventRL: Enhancing Event Extraction with Outcome Supervision for Large Language Models
Jun Gao
Huan Zhao
Wei Wang
Changlong Yu
Ruifeng Xu
OffRL
176
8
0
18 Feb 2024
I Learn Better If You Speak My Language: Understanding the Superior Performance of Fine-Tuning Large Language Models with LLM-Generated Responses
I Learn Better If You Speak My Language: Understanding the Superior Performance of Fine-Tuning Large Language Models with LLM-Generated Responses
Xuan Ren
Biao Wu
Lingqiao Liu
276
13
0
17 Feb 2024
Reward Generalization in RLHF: A Topological Perspective
Reward Generalization in RLHF: A Topological Perspective
Tianyi Qiu
Fanzhi Zeng
Jiaming Ji
Dong Yan
Kaile Wang
Jiayi Zhou
Yang Han
Josef Dai
Xuehai Pan
Yaodong Yang
AI4CE
370
7
0
15 Feb 2024
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
Shubham Toshniwal
Ivan Moshkov
Mehrzad Samadi
Daria Gitman
Fei Jia
Igor Gitman
247
140
0
15 Feb 2024
MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data
MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data
Yinya Huang
Xiaohan Lin
Zhengying Liu
Qingxing Cao
Huajian Xin
Haiming Wang
Zhenguo Li
Linqi Song
Xiaodan Liang
ALM
373
45
0
14 Feb 2024
GLoRe: When, Where, and How to Improve LLM Reasoning via Global and
  Local Refinements
GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements
Alex Havrilla
Sharath Raparthy
Christoforus Nalmpantis
Jane Dwivedi-Yu
Maksym Zhuravinskyi
Eric Hambro
Roberta Railneau
ReLMLRM
245
95
0
13 Feb 2024
Suppressing Pink Elephants with Direct Principle Feedback
Suppressing Pink Elephants with Direct Principle Feedback
Louis Castricato
Nathan Lile
Suraj Anand
Hailey Schoelkopf
Siddharth Verma
Stella Biderman
273
13
0
12 Feb 2024
V-STaR: Training Verifiers for Self-Taught Reasoners
V-STaR: Training Verifiers for Self-Taught Reasoners
Arian Hosseini
Xingdi Yuan
Nikolay Malkin
Rameswar Panda
Alessandro Sordoni
Rishabh Agarwal
ReLMLRM
321
192
0
09 Feb 2024
InternLM-Math: Open Math Large Language Models Toward Verifiable
  Reasoning
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning
Huaiyuan Ying
Shuo Zhang
Linyang Li
Zhejian Zhou
Yunfan Shao
...
Hang Yan
Xipeng Qiu
Jiayu Wang
Kai-xiang Chen
Dahua Lin
ReLMLRM
229
113
0
09 Feb 2024
Training Large Language Models for Reasoning through Reverse Curriculum
  Reinforcement Learning
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning
Zhiheng Xi
Wenxiang Chen
Boyang Hong
Senjie Jin
Rui Zheng
...
Xinbo Zhang
Yang Liu
Tao Gui
Tao Gui
Xuanjing Huang
LRM
206
53
0
08 Feb 2024
FaithLM: Towards Faithful Explanations for Large Language Models
FaithLM: Towards Faithful Explanations for Large Language Models
Yu-Neng Chuang
Guanchu Wang
Chia-Yuan Chang
Ruixiang Tang
Shaochen Zhong
Fan Yang
Mengnan Du
Xuanting Cai
Helen Zhou
Xia Hu
LRM
312
4
0
07 Feb 2024
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open
  Language Models
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao
Peiyi Wang
Qihao Zhu
Runxin Xu
Jun-Mei Song
...
Haowei Zhang
Mingchuan Zhang
Yiming Li
Yu-Huan Wu
Daya Guo
ReLMLRM
1.5K
3,768
0
05 Feb 2024
Unified Hallucination Detection for Multimodal Large Language Models
Unified Hallucination Detection for Multimodal Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Xiang Chen
Chenxi Wang
Yida Xue
Ningyu Zhang
Xiaoyan Yang
Qian Li
Yue Shen
Lei Liang
Jinjie Gu
Huajun Chen
HILM
434
67
0
05 Feb 2024
Empowering Time Series Analysis with Large Language Models: A Survey
Empowering Time Series Analysis with Large Language Models: A SurveyInternational Joint Conference on Artificial Intelligence (IJCAI), 2024
Yushan Jiang
Zijie Pan
Xikun Zhang
Sahil Garg
Anderson Schneider
Yuriy Nevmyvaka
Dongjin Song
AI4TSAIFin
364
79
0
05 Feb 2024
The Matrix: A Bayesian learning model for LLMs
The Matrix: A Bayesian learning model for LLMs
Siddhartha Dalal
Vishal Misra
119
1
0
05 Feb 2024
Multi-step Problem Solving Through a Verifier: An Empirical Analysis on
  Model-induced Process Supervision
Multi-step Problem Solving Through a Verifier: An Empirical Analysis on Model-induced Process Supervision
Zihan Wang
Yunxuan Li
Yuexin Wu
Liangchen Luo
Le Hou
Hongkun Yu
Jingbo Shang
LRM
228
42
0
05 Feb 2024
AMOR: A Recipe for Building Adaptable Modular Knowledge Agents Through
  Process Feedback
AMOR: A Recipe for Building Adaptable Modular Knowledge Agents Through Process Feedback
Jian Guan
Wei Wu
Zujie Wen
Peng Xu
Hongning Wang
Shiyu Huang
LRM
194
29
0
02 Feb 2024
StepCoder: Improve Code Generation with Reinforcement Learning from
  Compiler Feedback
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
Jiajun Sun
Yan Liu
Haoxiang Jia
Limao Xiong
Enyu Zhou
...
Changzhi Sun
Rui Zheng
Tao Gui
Xuanjing Huang
Tao Gui
LLMAG
302
75
0
02 Feb 2024
Dense Reward for Free in Reinforcement Learning from Human Feedback
Dense Reward for Free in Reinforcement Learning from Human Feedback
Alex J. Chan
Hao Sun
Samuel Holt
M. Schaar
268
60
0
01 Feb 2024
Learning Planning-based Reasoning by Trajectories Collection and Process
  Reward Synthesizing
Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing
Fangkai Jiao
Chengwei Qin
Zhengyuan Liu
Nancy F. Chen
Shafiq Joty
LRM
236
50
0
01 Feb 2024
Large Language Models for Mathematical Reasoning: Progresses and
  Challenges
Large Language Models for Mathematical Reasoning: Progresses and Challenges
Janice Ahn
Rishu Verma
Renze Lou
Di Liu
Rui Zhang
Wenpeng Yin
LRM
360
265
0
31 Jan 2024
EEG-GPT: Exploring Capabilities of Large Language Models for EEG
  Classification and Interpretation
EEG-GPT: Exploring Capabilities of Large Language Models for EEG Classification and Interpretation
Jonathan W. Kim
Ahmed Alaa
Danilo Bernardo
218
29
0
31 Jan 2024
Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length
  Extrapolation
Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation
Zhenyu He
Guhao Feng
Shengjie Luo
Kai-Bo Yang
Liwei Wang
Jingjing Xu
Zhi Zhang
Hongxia Yang
Di He
191
23
0
29 Jan 2024
ARGS: Alignment as Reward-Guided Search
ARGS: Alignment as Reward-Guided SearchInternational Conference on Learning Representations (ICLR), 2024
Maxim Khanov
Jirayu Burapacheep
Yixuan Li
426
93
0
23 Jan 2024
Linear Alignment: A Closed-form Solution for Aligning Human Preferences
  without Tuning and Feedback
Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and FeedbackInternational Conference on Machine Learning (ICML), 2024
Songyang Gao
Qiming Ge
Wei Shen
Jiajun Sun
Junjie Ye
...
Yicheng Zou
Zhi Chen
Hang Yan
Tao Gui
Dahua Lin
231
20
0
21 Jan 2024
Augmenting Math Word Problems via Iterative Question Composing
Augmenting Math Word Problems via Iterative Question ComposingAAAI Conference on Artificial Intelligence (AAAI), 2024
Haoxiong Liu
Yifan Zhang
Yifan Luo
Andrew Chi-Chih Yao
SyDaLRM
534
64
0
17 Jan 2024
ReFT: Reasoning with Reinforced Fine-Tuning
ReFT: Reasoning with Reinforced Fine-TuningAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Trung Quoc Luong
Xinbo Zhang
Zhanming Jie
Yang Liu
Xiaoran Jin
Hang Li
OffRLLRMReLM
316
236
0
17 Jan 2024
MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible
  Pipeline
MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible PipelineAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Minpeng Liao
Wei Luo
Chengxi Li
Jing Wu
Kai Fan
LRM
298
70
0
16 Jan 2024
Beyond Sparse Rewards: Enhancing Reinforcement Learning with Language
  Model Critique in Text Generation
Beyond Sparse Rewards: Enhancing Reinforcement Learning with Language Model Critique in Text Generation
Meng Cao
Lei Shu
Lei Yu
Yun Zhu
Nevan Wichers
Yinxiao Liu
Lei Meng
OffRLALM
321
15
0
14 Jan 2024
CHAMP: A Competition-level Dataset for Fine-Grained Analyses of LLMs'
  Mathematical Reasoning Capabilities
CHAMP: A Competition-level Dataset for Fine-Grained Analyses of LLMs' Mathematical Reasoning CapabilitiesAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Yujun Mao
Yoon Kim
Yilun Zhou
LRMReLM
296
37
0
13 Jan 2024
Improving Large Language Models via Fine-grained Reinforcement Learning
  with Minimum Editing Constraint
Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing ConstraintAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Zhipeng Chen
Kun Zhou
Wayne Xin Zhao
Junchen Wan
Fuzheng Zhang
Chen Zhang
Ji-Rong Wen
KELM
339
41
0
11 Jan 2024
Self-Contrast: Better Reflection Through Inconsistent Solving
  Perspectives
Self-Contrast: Better Reflection Through Inconsistent Solving PerspectivesAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Wenqi Zhang
Yongliang Shen
Linjuan Wu
Qiuying Peng
Jun Wang
Yueting Zhuang
Weiming Lu
LRMLLMAG
511
95
0
04 Jan 2024
Olapa-MCoT: Enhancing the Chinese Mathematical Reasoning Capability of
  LLMs
Olapa-MCoT: Enhancing the Chinese Mathematical Reasoning Capability of LLMs
Shaojie Zhu
Zhaobin Wang
Chengxiang Zhuo
Hui Lu
Bo Hu
Zang Li
LRM
127
0
0
29 Dec 2023
MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation
MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation
Zhongshen Zeng
Pengguang Chen
Shu Liu
Haiyun Jiang
Jiaya Jia
ReLMELMLRM
392
37
0
28 Dec 2023
Alleviating Hallucinations of Large Language Models through Induced
  Hallucinations
Alleviating Hallucinations of Large Language Models through Induced Hallucinations
Yue Zhang
Leyang Cui
Wei Bi
Shuming Shi
HILM
299
74
0
25 Dec 2023
Prompt Valuation Based on Shapley Values
Prompt Valuation Based on Shapley Values
Hanxi Liu
Xiaokai Mao
Haocheng Xia
Jian Lou
Jinfei Liu
200
9
0
24 Dec 2023
Reasons to Reject? Aligning Language Models with Judgments
Reasons to Reject? Aligning Language Models with Judgments
Weiwen Xu
Deng Cai
Zhisong Zhang
Wai Lam
Shuming Shi
ALM
346
17
0
22 Dec 2023
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak
  Supervision
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak SupervisionInternational Conference on Machine Learning (ICML), 2023
Collin Burns
Pavel Izmailov
Jan Hendrik Kirchner
Bowen Baker
Leo Gao
...
Adrien Ecoffet
Manas Joglekar
Jan Leike
Ilya Sutskever
Jeff Wu
ELM
344
382
0
14 Dec 2023
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human
  Annotations
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human AnnotationsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Peiyi Wang
Lei Li
Zhihong Shao
R. X. Xu
Damai Dai
Yifei Li
Deli Chen
Y.Wu
Zhifang Sui
AIMatLRMALM
442
662
0
14 Dec 2023
Alignment for Honesty
Alignment for HonestyNeural Information Processing Systems (NeurIPS), 2023
Yuqing Yang
Ethan Chern
Xipeng Qiu
Graham Neubig
Pengfei Liu
257
58
0
12 Dec 2023
NLLG Quarterly arXiv Report 09/23: What are the most influential current
  AI Papers?
NLLG Quarterly arXiv Report 09/23: What are the most influential current AI Papers?
Ran Zhang
Aida Kostikova
Christoph Leiter
Jonas Belouadi
Daniil Larionov
Yanran Chen
Vivian Fresen
Steffen Eger
182
0
0
09 Dec 2023
Large Knowledge Model: Perspectives and Challenges
Large Knowledge Model: Perspectives and ChallengesData Intelligence (DI), 2023
Huajun Chen
KELM
352
22
0
05 Dec 2023
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from
  Fine-grained Correctional Human Feedback
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human FeedbackComputer Vision and Pattern Recognition (CVPR), 2023
M. Steyvers
Yuan Yao
Haoye Zhang
Taiwen He
Yifeng Han
...
Xinyue Hu
Zhiyuan Liu
Hai-Tao Zheng
Maosong Sun
Tat-Seng Chua
MLLMVLM
420
343
0
01 Dec 2023
LLM-Assisted Code Cleaning For Training Accurate Code Generators
LLM-Assisted Code Cleaning For Training Accurate Code GeneratorsInternational Conference on Learning Representations (ICLR), 2023
Naman Jain
Tianjun Zhang
Wei-Lin Chiang
Joseph E. Gonzalez
Koushik Sen
Ion Stoica
185
40
0
25 Nov 2023
Positional Description Matters for Transformers Arithmetic
Positional Description Matters for Transformers Arithmetic
Ruoqi Shen
Sébastien Bubeck
Ronen Eldan
Yin Tat Lee
Yuanzhi Li
Yi Zhang
265
56
0
22 Nov 2023
A Baseline Analysis of Reward Models' Ability To Accurately Analyze
  Foundation Models Under Distribution Shift
A Baseline Analysis of Reward Models' Ability To Accurately Analyze Foundation Models Under Distribution Shift
Will LeVine
Benjamin Pikus
Tony Chen
Sean Hendryx
592
16
0
21 Nov 2023
Igniting Language Intelligence: The Hitchhiker's Guide From
  Chain-of-Thought Reasoning to Language Agents
Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents
Zhuosheng Zhang
Yao Yao
Aston Zhang
Xiangru Tang
Xinbei Ma
...
Yiming Wang
Mark B. Gerstein
Rui Wang
Gongshen Liu
Hai Zhao
LLMAGLM&RoLRM
357
91
0
20 Nov 2023
Meta Prompting for AI Systems
Meta Prompting for AI Systems
Yifan Zhang
Yang Yuan
Andrew Chi-Chih Yao
LLMAGLRM
737
16
0
20 Nov 2023
OVM, Outcome-supervised Value Models for Planning in Mathematical
  Reasoning
OVM, Outcome-supervised Value Models for Planning in Mathematical Reasoning
Fei Yu
Anningzhe Gao
Benyou Wang
OffRLLRM
228
82
0
16 Nov 2023
Previous
123...26272829
Next