ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.20050
  4. Cited By
Let's Verify Step by Step

Let's Verify Step by Step

International Conference on Learning Representations (ICLR), 2023
31 May 2023
Hunter Lightman
V. Kosaraju
Yura Burda
Harrison Edwards
Bowen Baker
Teddy Lee
Jan Leike
John Schulman
Ilya Sutskever
K. Cobbe
    ALMOffRLLRM
ArXiv (abs)PDFHTMLHuggingFace (10 upvotes)

Papers citing "Let's Verify Step by Step"

50 / 1,441 papers shown
DuetSim: Building User Simulator with Dual Large Language Models for
  Task-Oriented Dialogues
DuetSim: Building User Simulator with Dual Large Language Models for Task-Oriented DialoguesInternational Conference on Language Resources and Evaluation (LREC), 2024
Xiang Luo
Zhiwen Tang
Jin Wang
Xuejie Zhang
215
13
0
16 May 2024
IM-RAG: Multi-Round Retrieval-Augmented Generation Through Learning
  Inner Monologues
IM-RAG: Multi-Round Retrieval-Augmented Generation Through Learning Inner MonologuesAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2024
Diji Yang
Jinmeng Rao
Kezhen Chen
Xiaoyuan Guo
Yawen Zhang
Jie Yang
Yi Zhang
LRMRALM
284
44
0
15 May 2024
LLMs can Find Mathematical Reasoning Mistakes by Pedagogical Chain-of-Thought
LLMs can Find Mathematical Reasoning Mistakes by Pedagogical Chain-of-ThoughtInternational Joint Conference on Artificial Intelligence (IJCAI), 2024
Zhuoxuan Jiang
Haoyuan Peng
Shanshan Feng
Fan Li
Dongsheng Li
KELMLRM
444
28
0
09 May 2024
Optimizing Language Model's Reasoning Abilities with Weak Supervision
Optimizing Language Model's Reasoning Abilities with Weak Supervision
Yongqi Tong
Sizhe Wang
Dawei Li
Yifan Wang
Simeng Han
Zi Lin
Chengsong Huang
Jiaxin Huang
Jingbo Shang
LRMReLM
243
13
0
07 May 2024
AlphaMath Almost Zero: process Supervision without process
AlphaMath Almost Zero: process Supervision without processNeural Information Processing Systems (NeurIPS), 2024
Guoxin Chen
Minpeng Liao
Chengxi Li
Kai Fan
AIMatLRM
273
171
0
06 May 2024
ATG: Benchmarking Automated Theorem Generation for Generative Language
  Models
ATG: Benchmarking Automated Theorem Generation for Generative Language Models
Xiaohan Lin
Qingxing Cao
Yinya Huang
Zhicheng YANG
Zhengying Liu
Zhenguo Li
Xiaodan Liang
281
9
0
05 May 2024
The Real, the Better: Aligning Large Language Models with Online Human
  Behaviors
The Real, the Better: Aligning Large Language Models with Online Human Behaviors
Guanying Jiang
Lingyong Yan
Haibo Shi
D. Yin
215
4
0
01 May 2024
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference
  Learning
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning
Yuxi Xie
Anirudh Goyal
Wenyue Zheng
Min-Yen Kan
Timothy Lillicrap
Kenji Kawaguchi
Michael Shieh
ReLMLRM
412
197
0
01 May 2024
DPO Meets PPO: Reinforced Token Optimization for RLHF
DPO Meets PPO: Reinforced Token Optimization for RLHF
Han Zhong
Zikang Shan
Guhao Feng
Wei Xiong
Xinle Cheng
Li Zhao
Di He
Jiang Bian
Liwei Wang
625
97
0
29 Apr 2024
Small Language Models Need Strong Verifiers to Self-Correct Reasoning
Small Language Models Need Strong Verifiers to Self-Correct Reasoning
Yunxiang Zhang
Muhammad Khalifa
Lajanugen Logeswaran
Jaekyeom Kim
Moontae Lee
Honglak Lee
Lu Wang
LRMKELMReLM
325
72
0
26 Apr 2024
Tele-FLM Technical Report
Tele-FLM Technical Report
Xiang Li
Yiqun Yao
Xin Jiang
Xuezhi Fang
Chao Wang
...
Yequan Wang
Zhongjiang He
Zhongyuan Wang
Xuelong Li
Tiejun Huang
209
11
0
25 Apr 2024
NExT: Teaching Large Language Models to Reason about Code Execution
NExT: Teaching Large Language Models to Reason about Code Execution
Ansong Ni
Miltiadis Allamanis
Arman Cohan
Yinlin Deng
Kensen Shi
Charles Sutton
Pengcheng Yin
ReLMLRM
270
62
0
23 Apr 2024
Toward Self-Improvement of LLMs via Imagination, Searching, and
  Criticizing
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Ye Tian
Baolin Peng
Linfeng Song
Lifeng Jin
Dian Yu
Haitao Mi
Dong Yu
LRMReLM
261
124
0
18 Apr 2024
Paraphrase and Solve: Exploring and Exploiting the Impact of Surface
  Form on Mathematical Reasoning in Large Language Models
Paraphrase and Solve: Exploring and Exploiting the Impact of Surface Form on Mathematical Reasoning in Large Language Models
Yue Zhou
Yada Zhu
Diego Antognini
Yoon Kim
Yang Zhang
ReLMLRM
104
9
0
17 Apr 2024
Many-Shot In-Context Learning
Many-Shot In-Context Learning
Rishabh Agarwal
Avi Singh
Lei M. Zhang
Bernd Bohnet
Luis Rosias
...
John D. Co-Reyes
Eric Chu
Feryal M. P. Behbahani
Aleksandra Faust
Hugo Larochelle
ReLMOffRLBDL
432
180
0
17 Apr 2024
Self-Explore to Avoid the Pit: Improving the Reasoning Capabilities of
  Language Models with Fine-grained Rewards
Self-Explore to Avoid the Pit: Improving the Reasoning Capabilities of Language Models with Fine-grained Rewards
Hyeonbin Hwang
Doyoung Kim
Seungone Kim
Seonghyeon Ye
Minjoon Seo
LRMReLM
346
7
0
16 Apr 2024
RLHF Deciphered: A Critical Analysis of Reinforcement Learning from
  Human Feedback for LLMs
RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs
Shreyas Chaudhari
Pranjal Aggarwal
Vishvak Murahari
Tanmay Rajpurohit
Ashwin Kalyan
Karthik Narasimhan
Ameet Deshpande
Bruno Castro da Silva
407
88
0
12 Apr 2024
Rho-1: Not All Tokens Are What You Need
Rho-1: Not All Tokens Are What You Need
Zheng-Wen Lin
Zhibin Gou
Yeyun Gong
Xiao Liu
Haoran Pan
...
Chen Lin
Yujiu Yang
Jian Jiao
Nan Duan
Weizhu Chen
CLL
379
111
0
11 Apr 2024
Best Practices and Lessons Learned on Synthetic Data for Language Models
Best Practices and Lessons Learned on Synthetic Data for Language Models
Ruibo Liu
Jerry W. Wei
Fangyu Liu
Chenglei Si
Yanzhe Zhang
...
Steven Zheng
Daiyi Peng
Diyi Yang
Denny Zhou
Andrew M. Dai
SyDaEgoV
304
112
0
11 Apr 2024
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
Yikang Shen
Zhen Guo
Tianle Cai
Zengyi Qin
MoEALM
244
45
0
11 Apr 2024
Evaluating Mathematical Reasoning Beyond Accuracy
Evaluating Mathematical Reasoning Beyond Accuracy
Shijie Xia
Xuefeng Li
Yixin Liu
Tongshuang Wu
Pengfei Liu
LRMReLM
336
54
0
08 Apr 2024
LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step
  Reasoning with Large Language Models
LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models
Shibo Hao
Yi Gu
Haotian Luo
Tianyang Liu
Xiyan Shao
...
Haodi Ma
Adithya Samavedhi
Qiyue Gao
Zhen Wang
Zhiting Hu
LRMELM
291
1
0
08 Apr 2024
MM-MATH: Advancing Multimodal Math Evaluation with Process Evaluation
  and Fine-grained Classification
MM-MATH: Advancing Multimodal Math Evaluation with Process Evaluation and Fine-grained Classification
Kai Sun
Yushi Bai
Ji Qi
Lei Hou
Juanzi Li
LRM
288
39
0
07 Apr 2024
SAAS: Solving Ability Amplification Strategy for Enhanced Mathematical
  Reasoning in Large Language Models
SAAS: Solving Ability Amplification Strategy for Enhanced Mathematical Reasoning in Large Language Models
Hyeonwoo Kim
Gyoungjin Gim
Yungi Kim
Jihoo Kim
Byungju Kim
Wonseok Lee
Chanjun Park
ReLMLRM
304
1
0
05 Apr 2024
Evaluating LLMs at Detecting Errors in LLM Responses
Evaluating LLMs at Detecting Errors in LLM Responses
Ryo Kamoi
Sarkar Snigdha Sarathi Das
Renze Lou
Jihyun Janice Ahn
Yilun Zhao
...
Salika Dave
Shaobo Qin
Arman Cohan
Wenpeng Yin
Rui Zhang
217
46
0
04 Apr 2024
Conifer: Improving Complex Constrained Instruction-Following Ability of
  Large Language Models
Conifer: Improving Complex Constrained Instruction-Following Ability of Large Language Models
Haoran Sun
Lixin Liu
Junjie Li
Fengyu Wang
Baohua Dong
Ran Lin
Ruohui Huang
198
23
0
03 Apr 2024
A Survey on Large Language Model-Based Game Agents
A Survey on Large Language Model-Based Game Agents
Sihao Hu
Tiansheng Huang
Gaowen Liu
Ramana Rao Kompella
Gaowen Liu
Selim Furkan Tekin
Yichang Xu
Zachary Yahn
Ling Liu
AI4CELLMAGLM&RoLM&MA
680
107
0
02 Apr 2024
Stream of Search (SoS): Learning to Search in Language
Stream of Search (SoS): Learning to Search in Language
Kanishk Gandhi
Denise Lee
Gabriel Grand
Muxin Liu
Winson Cheng
Archit Sharma
Noah D. Goodman
RALMAIFinLRM
263
114
0
01 Apr 2024
Comparing Bad Apples to Good Oranges: Aligning Large Language Models via Joint Preference Optimization
Comparing Bad Apples to Good Oranges: Aligning Large Language Models via Joint Preference Optimization
Hritik Bansal
Ashima Suvarna
Gantavya Bhatt
Nanyun Peng
Kai-Wei Chang
Aditya Grover
ALM
415
16
0
31 Mar 2024
Can LLMs Learn from Previous Mistakes? Investigating LLMs' Errors to
  Boost for Reasoning
Can LLMs Learn from Previous Mistakes? Investigating LLMs' Errors to Boost for Reasoning
Yongqi Tong
Dawei Li
Sizhe Wang
Yujia Wang
Fei Teng
Jingbo Shang
LRM
411
85
0
29 Mar 2024
Mitigating Misleading Chain-of-Thought Reasoning with Selective
  Filtering
Mitigating Misleading Chain-of-Thought Reasoning with Selective Filtering
Yexin Wu
Zhuosheng Zhang
Hai Zhao
LRM
193
9
0
28 Mar 2024
Learning From Correctness Without Prompting Makes LLM Efficient Reasoner
Learning From Correctness Without Prompting Makes LLM Efficient Reasoner
Yuxuan Yao
Han Wu
Zhijiang Guo
Biyan Zhou
Jiahui Gao
Sichun Luo
Hanxu Hou
Mingwen Liu
Linqi Song
LLMAGLRM
342
14
0
28 Mar 2024
Improving Attributed Text Generation of Large Language Models via
  Preference Learning
Improving Attributed Text Generation of Large Language Models via Preference Learning
Dongfang Li
Zetian Sun
Baotian Hu
Zhenyu Liu
Xinshuo Hu
Xuebo Liu
Min Zhang
191
23
0
27 Mar 2024
RewardBench: Evaluating Reward Models for Language Modeling
RewardBench: Evaluating Reward Models for Language Modeling
Nathan Lambert
Valentina Pyatkin
Jacob Morrison
Lester James V. Miranda
Bill Yuchen Lin
...
Sachin Kumar
Tom Zick
Yejin Choi
Noah A. Smith
Hanna Hajishirzi
ALM
468
335
0
20 Mar 2024
RankPrompt: Step-by-Step Comparisons Make Language Models Better
  Reasoners
RankPrompt: Step-by-Step Comparisons Make Language Models Better ReasonersInternational Conference on Language Resources and Evaluation (LREC), 2024
Chi Hu
Yuan Ge
Xiangnan Ma
Hang Cao
Qiang Li
Yonghua Yang
Tong Xiao
Jingbo Zhu
ReLMELMLRMALM
317
10
0
19 Mar 2024
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
Easy-to-Hard Generalization: Scalable Alignment Beyond Human SupervisionNeural Information Processing Systems (NeurIPS), 2024
Zhiqing Sun
Longhui Yu
Yikang Shen
Weiyang Liu
Yiming Yang
Sean Welleck
Chuang Gan
233
92
0
14 Mar 2024
ALaRM: Align Language Models via Hierarchical Rewards Modeling
ALaRM: Align Language Models via Hierarchical Rewards ModelingAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Yuhang Lai
Siyuan Wang
Shujun Liu
Xuanjing Huang
Zhongyu Wei
280
8
0
11 Mar 2024
Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought
Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought
James Chua
Edward Rees
Hunar Batra
Samuel R. Bowman
Julian Michael
Ethan Perez
Miles Turpin
LRM
312
23
0
08 Mar 2024
Common 7B Language Models Already Possess Strong Math Capabilities
Common 7B Language Models Already Possess Strong Math Capabilities
Chen Li
Weiqi Wang
Jingcheng Hu
Yixuan Wei
Nanning Zheng
Han Hu
Zheng Zhang
Houwen Peng
ALMLRM
213
111
0
07 Mar 2024
Teaching Large Language Models to Reason with Reinforcement Learning
Teaching Large Language Models to Reason with Reinforcement Learning
Alex Havrilla
Yuqing Du
Sharath Chandra Raparthy
Christoforos Nalmpantis
Jane Dwivedi-Yu
Maksym Zhuravinskyi
Eric Hambro
Sainbayar Sukhbaatar
Roberta Raileanu
ReLMLRM
265
142
0
07 Mar 2024
DACO: Towards Application-Driven and Comprehensive Data Analysis via
  Code Generation
DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation
Xueqing Wu
Rui Zheng
Jingzhen Sha
Te-Lin Wu
Hanyu Zhou
Mohan Tang
Kai-Wei Chang
Nanyun Peng
Haoran Huang
246
5
0
04 Mar 2024
Trial and Error: Exploration-Based Trajectory Optimization for LLM
  Agents
Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents
Yifan Song
Da Yin
Xiang Yue
Jie Huang
Sujian Li
Bill Yuchen Lin
292
134
0
04 Mar 2024
Masked Thought: Simply Masking Partial Reasoning Steps Can Improve
  Mathematical Reasoning Learning of Language Models
Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models
Changyu Chen
Xiting Wang
Ting-En Lin
Ang Lv
Yuchuan Wu
Xin Gao
Ji-Rong Wen
Rui Yan
Yongbin Li
ReLMLRM
245
20
0
04 Mar 2024
From Large Language Models and Optimization to Decision Optimization
  CoPilot: A Research Manifesto
From Large Language Models and Optimization to Decision Optimization CoPilot: A Research Manifesto
Segev Wasserkrug
Léonard Boussioux
D. Hertog
F. Mirzazadeh
Ilker Birbil
Jannis Kurtz
Donato Maragno
LLMAG
275
15
0
26 Feb 2024
Debug like a Human: A Large Language Model Debugger via Verifying
  Runtime Execution Step-by-step
Debug like a Human: A Large Language Model Debugger via Verifying Runtime Execution Step-by-step
Li Zhong
Zilong Wang
Jingbo Shang
439
121
0
25 Feb 2024
Stepwise Self-Consistent Mathematical Reasoning with Large Language
  Models
Stepwise Self-Consistent Mathematical Reasoning with Large Language Models
Zilong Zhao
Yao Rong
Dongyang Guo
Emek Gözlüklü
Emir Gülboy
Enkelejda Kasneci
LRM
268
4
0
24 Feb 2024
Fine-Grained Self-Endorsement Improves Factuality and Reasoning
Fine-Grained Self-Endorsement Improves Factuality and Reasoning
Ante Wang
Linfeng Song
Baolin Peng
Ye Tian
Lifeng Jin
Haitao Mi
Jinsong Su
Dong Yu
HILMLRM
151
9
0
23 Feb 2024
CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
Zicheng Lin
Zhibin Gou
Tian Liang
Ruilin Luo
Haowei Liu
Yujiu Yang
LRM
404
78
0
22 Feb 2024
Mafin: Enhancing Black-Box Embeddings with Model Augmented Fine-Tuning
Mafin: Enhancing Black-Box Embeddings with Model Augmented Fine-Tuning
Mingtian Zhang
Shawn Lan
Peter Hayes
David Barber
455
4
0
19 Feb 2024
DiLA: Enhancing LLM Tool Learning with Differential Logic Layer
DiLA: Enhancing LLM Tool Learning with Differential Logic Layer
Yu Zhang
Hui-Ling Zhen
Zehua Pei
Yingzhao Lian
Lihao Yin
Mingxuan Yuan
Bei Yu
LRM
311
4
0
19 Feb 2024
Previous
123...2526272829
Next