ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.20050
  4. Cited By
Let's Verify Step by Step

Let's Verify Step by Step

International Conference on Learning Representations (ICLR), 2023
31 May 2023
Hunter Lightman
V. Kosaraju
Yura Burda
Harrison Edwards
Bowen Baker
Teddy Lee
Jan Leike
John Schulman
Ilya Sutskever
K. Cobbe
    ALMOffRLLRM
ArXiv (abs)PDFHTMLHuggingFace (10 upvotes)

Papers citing "Let's Verify Step by Step"

50 / 1,441 papers shown
RLHF Can Speak Many Languages: Unlocking Multilingual Preference
  Optimization for LLMs
RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs
John Dang
Arash Ahmadian
Kelly Marchisio
Julia Kreutzer
Ahmet Üstün
Sara Hooker
247
43
0
02 Jul 2024
DogeRM: Equipping Reward Models with Domain Knowledge through Model
  Merging
DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging
Tzu-Han Lin
Chen-An Li
Hung-yi Lee
Yun-Nung Chen
VLMALM
139
6
0
01 Jul 2024
DynaThink: Fast or Slow? A Dynamic Decision-Making Framework for Large
  Language Models
DynaThink: Fast or Slow? A Dynamic Decision-Making Framework for Large Language Models
Jiabao Pan
Yan Zhang
Chen Zhang
Zuozhu Liu
Hongwei Wang
Haizhou Li
LRM
138
15
0
01 Jul 2024
Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical
  Reasoning
Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning
Zimu Lu
Aojun Zhou
Ke Wang
Houxing Ren
Weikang Shi
Junting Pan
Mingjie Zhan
Hongsheng Li
LRM
290
36
0
30 Jun 2024
Advancing Process Verification for Large Language Models via Tree-Based
  Preference Learning
Advancing Process Verification for Large Language Models via Tree-Based Preference Learning
Mingqian He
Yongliang Shen
Wenqi Zhang
Zeqi Tan
Weiming Lu
LRM
226
13
0
29 Jun 2024
LiteSearch: Efficacious Tree Search for LLM
LiteSearch: Efficacious Tree Search for LLM
Ante Wang
Linfeng Song
Ye Tian
Baolin Peng
Dian Yu
Haitao Mi
Jinsong Su
Dong Yu
246
32
0
29 Jun 2024
Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
Danny Halawi
Alexander Wei
Eric Wallace
Tony T. Wang
Nika Haghtalab
Jacob Steinhardt
SILMAAML
227
58
0
28 Jun 2024
The SIFo Benchmark: Investigating the Sequential Instruction Following
  Ability of Large Language Models
The SIFo Benchmark: Investigating the Sequential Instruction Following Ability of Large Language Models
Xinyi Chen
Baohao Liao
Jirui Qi
Panagiotis Eustratiadis
Christof Monz
Arianna Bisazza
Maarten de Rijke
ALMELMLRM
222
11
0
28 Jun 2024
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of
  LLMs
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs
Xin Lai
Zhuotao Tian
Yukang Chen
Senqiao Yang
Xiangru Peng
Jiaya Jia
LRM
414
212
0
26 Jun 2024
ARES: Alternating Reinforcement Learning and Supervised Fine-Tuning for
  Enhanced Multi-Modal Chain-of-Thought Reasoning Through Diverse AI Feedback
ARES: Alternating Reinforcement Learning and Supervised Fine-Tuning for Enhanced Multi-Modal Chain-of-Thought Reasoning Through Diverse AI Feedback
Ju-Seung Byun
Jiyun Chun
Jihyung Kil
Andrew Perrault
ReLMLRM
313
12
0
25 Jun 2024
From Decoding to Meta-Generation: Inference-time Algorithms for Large
  Language Models
From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models
Sean Welleck
Amanda Bertsch
Matthew Finlayson
Hailey Schoelkopf
Alex Xie
Graham Neubig
Ilia Kulikov
Zaid Harchaoui
374
110
0
24 Jun 2024
Task Oriented In-Domain Data Augmentation
Task Oriented In-Domain Data Augmentation
Xiao Liang
Xinyu Hu
Simiao Zuo
Yeyun Gong
Qiang Lou
Yi Liu
Shao-Lun Huang
Jian Jiao
194
8
0
24 Jun 2024
On the Transformations across Reward Model, Parameter Update, and
  In-Context Prompt
On the Transformations across Reward Model, Parameter Update, and In-Context Prompt
Deng Cai
Huayang Li
Tingchen Fu
Siheng Li
Weiwen Xu
...
Leyang Cui
Yan Wang
Lemao Liu
Taro Watanabe
Shuming Shi
KELM
232
2
0
24 Jun 2024
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in
  Large Video-Language Models
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models
Yuxuan Wang
Yueqian Wang
Dongyan Zhao
Cihang Xie
Zilong Zheng
MLLMVLM
261
53
0
24 Jun 2024
CAVE: Controllable Authorship Verification Explanations
CAVE: Controllable Authorship Verification Explanations
Sahana Ramnath
Kartik Pandey
Elizabeth Boschee
Xiang Ren
388
3
0
24 Jun 2024
Chain-of-Probe: Examining the Necessity and Accuracy of CoT Step-by-Step
Chain-of-Probe: Examining the Necessity and Accuracy of CoT Step-by-Step
Zezhong Wang
Xingshan Zeng
Weiwen Liu
Yufei Wang
Liangyou Li
Yasheng Wang
Lifeng Shang
Xin Jiang
Qun Liu
Kam-Fai Wong
LRM
255
5
0
23 Jun 2024
PORT: Preference Optimization on Reasoning Traces
PORT: Preference Optimization on Reasoning Traces
Salem Lahlou
Abdalgader Abubaker
Hakim Hacid
LRM
331
7
0
23 Jun 2024
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning
Chaojie Wang
Yanchen Deng
Zhiyi Lyu
Liang Zeng
Jujie He
Shuicheng Yan
Bo An
LRMReLM
349
95
0
20 Jun 2024
LLM Critics Help Catch Bugs in Mathematics: Towards a Better
  Mathematical Verifier with Natural Language Feedback
LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback
Bofei Gao
Zefan Cai
Runxin Xu
Peiyi Wang
Ce Zheng
...
Chang Zhou
Wen Xiao
Junjie Hu
Tianyu Liu
Baobao Chang
LRM
341
40
0
20 Jun 2024
Interpretable Preferences via Multi-Objective Reward Modeling and
  Mixture-of-Experts
Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-ExpertsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Haoxiang Wang
Wei Xiong
Tengyang Xie
Han Zhao
Tong Zhang
296
302
0
18 Jun 2024
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical
  Problem-Solving
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
Yuxuan Tong
Xiwen Zhang
Rui Wang
R. Wu
Junxian He
AIMatLRM
240
81
0
18 Jun 2024
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
Zhen Huang
Zengzhi Wang
Shijie Xia
Xuefeng Li
Haoyang Zou
...
Yuxiang Zheng
Shaoting Zhang
Dahua Lin
Yu Qiao
Pengfei Liu
ELMLRM
299
71
0
18 Jun 2024
Dialogue Action Tokens: Steering Language Models in Goal-Directed
  Dialogue with a Multi-Turn Planner
Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner
Kenneth Li
Yiming Wang
Fernanda Viégas
Martin Wattenberg
268
10
0
17 Jun 2024
Nemotron-4 340B Technical Report
Nemotron-4 340B Technical Report
Nvidia
:
Bo Adler
Niket Agarwal
Ashwath Aithal
...
Jimmy Zhang
Jing Zhang
Vivienne Zhang
Yian Zhang
Chen Zhu
303
111
0
17 Jun 2024
GeoGPT4V: Towards Geometric Multi-modal Large Language Models with
  Geometric Image Generation
GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation
Shihao Cai
Keqin Bao
Hangyu Guo
Jizhi Zhang
Jun Song
Bo Zheng
185
28
0
17 Jun 2024
A Survey on Human Preference Learning for Large Language Models
A Survey on Human Preference Learning for Large Language Models
Ruili Jiang
Kehai Chen
Xuefeng Bai
Zhixuan He
Juntao Li
Muyun Yang
Tiejun Zhao
Liqiang Nie
Min Zhang
282
16
0
17 Jun 2024
Watch Every Step! LLM Agent Learning via Iterative Step-Level Process
  Refinement
Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement
Weimin Xiong
Yifan Song
Xiutian Zhao
Wenhao Wu
Xun Wang
Ke Wang
Cheng Li
Wei Peng
Sujian Li
243
64
0
17 Jun 2024
HelpSteer2: Open-source dataset for training top-performing reward
  models
HelpSteer2: Open-source dataset for training top-performing reward models
Zhilin Wang
Yi Dong
Olivier Delalleau
Jiaqi Zeng
Gerald Shen
Daniel Egert
Jimmy J. Zhang
Makesh Narsimhan Sreedhar
Oleksii Kuchaiev
AI4TS
312
163
0
12 Jun 2024
Discovering Preference Optimization Algorithms with and for Large
  Language Models
Discovering Preference Optimization Algorithms with and for Large Language Models
Chris Xiaoxuan Lu
Samuel Holt
Claudio Fanconi
Alex J. Chan
Jakob Foerster
M. Schaar
R. T. Lange
OffRL
315
26
0
12 Jun 2024
A Critical Look At Tokenwise Reward-Guided Text Generation
A Critical Look At Tokenwise Reward-Guided Text Generation
Ahmad Rashid
Ruotian Wu
Julia Grosse
Agustinus Kristiadi
Pascal Poupart
OffRL
601
5
0
12 Jun 2024
TextGrad: Automatic "Differentiation" via Text
TextGrad: Automatic "Differentiation" via Text
Mert Yuksekgonul
Federico Bianchi
Joseph Boen
Sheng Liu
Zhi Huang
Carlos Guestrin
James Zou
LLMAGOODAI4CE
355
95
0
11 Jun 2024
PowerInfer-2: Fast Large Language Model Inference on a Smartphone
PowerInfer-2: Fast Large Language Model Inference on a Smartphone
Zhenliang Xue
Yixin Song
Zeyu Mi
Le Chen
Yubin Xia
Haibo Chen
326
90
0
10 Jun 2024
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Seungone Kim
Juyoung Suk
Ji Yong Cho
Shayne Longpre
Chaeeun Kim
...
Sean Welleck
Graham Neubig
Moontae Lee
Kyungjae Lee
Minjoon Seo
ELMALMLM&MA
427
71
0
09 Jun 2024
Improve Mathematical Reasoning in Language Models by Automated Process
  Supervision
Improve Mathematical Reasoning in Language Models by Automated Process Supervision
Liangchen Luo
Yinxiao Liu
Rosanne Liu
Samrat Phatale
Harsh Lara
...
Lei Shu
Yun Zhu
Lei Meng
Jiao Sun
Abhinav Rastogi
LRM
308
317
0
05 Jun 2024
Exploring Mathematical Extrapolation of Large Language Models with
  Synthetic Data
Exploring Mathematical Extrapolation of Large Language Models with Synthetic Data
Haolong Li
Yu Ma
Yinqi Zhang
Chen Ye
Jie Chen
ReLMLRM
195
5
0
04 Jun 2024
Process-Driven Autoformalization in Lean 4
Process-Driven Autoformalization in Lean 4
Jianqiao Lu
Zhengying Liu
Yingjia Wan
Yinya Huang
Haiming Wang
Zhicheng YANG
Jing Tang
Zhijiang Guo
AI4CE
394
34
0
04 Jun 2024
When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of
  Self-Correction of LLMs
When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs
Ryo Kamoi
Yusen Zhang
Nan Zhang
Jiawei Han
Rui Zhang
LRM
385
153
0
03 Jun 2024
SemCoder: Training Code Language Models with Comprehensive Semantics
SemCoder: Training Code Language Models with Comprehensive Semantics
Yangruibo Ding
Jinjun Peng
Marcus J. Min
Gail E. Kaiser
Junfeng Yang
Baishakhi Ray
OffRL
289
35
0
03 Jun 2024
Improving Reward Models with Synthetic Critiques
Improving Reward Models with Synthetic Critiques
Zihuiwen Ye
Fraser Greenlee-Scott
Max Bartolo
Phil Blunsom
Jon Ander Campos
Matthias Gallé
ALMSyDaLRM
269
37
0
31 May 2024
ANAH: Analytical Annotation of Hallucinations in Large Language Models
ANAH: Analytical Annotation of Hallucinations in Large Language Models
Ziwei Ji
Yuzhe Gu
Wenwei Zhang
Chengqi Lyu
Dahua Lin
Kai-xiang Chen
HILM
203
8
0
30 May 2024
Weak-to-Strong Search: Align Large Language Models via Searching over
  Small Language Models
Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models
Zhanhui Zhou
Zhixuan Liu
Jie Liu
Zhichen Dong
Chao Yang
Yu Qiao
ALM
296
36
0
29 May 2024
Exploring the LLM Journey from Cognition to Expression with Linear
  Representations
Exploring the LLM Journey from Cognition to Expression with Linear Representations
Yuzi Yan
J. Li
Yipin Zhang
Dong Yan
234
5
0
27 May 2024
MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time
MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time
Jikun Kang
Xin Zhe Li
Xi Chen
Amirreza Kazemi
Qianyi Sun
...
Xu He
Quan He
Feng Wen
Jianye Hao
Jun Yao
LRMReLM
296
35
0
25 May 2024
Models That Prove Their Own Correctness
Models That Prove Their Own Correctness
Noga Amit
S. Goldwasser
Orr Paradise
G. Rothblum
LRM
452
5
0
24 May 2024
SimPO: Simple Preference Optimization with a Reference-Free Reward
SimPO: Simple Preference Optimization with a Reference-Free RewardNeural Information Processing Systems (NeurIPS), 2024
Yu Meng
Mengzhou Xia
Danqi Chen
538
785
0
23 May 2024
Calibrated Self-Rewarding Vision Language Models
Calibrated Self-Rewarding Vision Language ModelsNeural Information Processing Systems (NeurIPS), 2024
Yiyang Zhou
Zhiyuan Fan
Dongjie Cheng
Sihan Yang
Zhaorun Chen
Chenhang Cui
Xiyao Wang
Yun Li
Linjun Zhang
Huaxiu Yao
VLM
302
65
0
23 May 2024
Tutorly: Turning Programming Videos Into Apprenticeship Learning
  Environments with LLMs
Tutorly: Turning Programming Videos Into Apprenticeship Learning Environments with LLMs
Wengxi Li
Roy Pea
Nick Haber
Hari Subramonyam
151
7
0
21 May 2024
Hummer: Towards Limited Competitive Preference Dataset
Hummer: Towards Limited Competitive Preference Dataset
Li Jiang
Yusen Wu
Junwu Xiong
Jingqing Ruan
Yichuan Ding
Qingpei Guo
ZuJie Wen
Jun Zhou
Xiaotie Deng
397
10
0
19 May 2024
Generative Artificial Intelligence: A Systematic Review and Applications
Generative Artificial Intelligence: A Systematic Review and Applications
S. S. Sengar
Affan Bin Hasan
Sanjay Kumar
Fiona Carroll
MedIm
298
227
0
17 May 2024
Fine-Tuning Large Vision-Language Models as Decision-Making Agents via
  Reinforcement Learning
Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2024
Yuexiang Zhai
Hao Bai
Zipeng Lin
Jiayi Pan
Shengbang Tong
...
Alane Suhr
Saining Xie
Yann LeCun
Yi-An Ma
Sergey Levine
LLMAGLRM
354
132
0
16 May 2024
Previous
123...242526272829
Next