Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2305.20050
Cited By
Let's Verify Step by Step
International Conference on Learning Representations (ICLR), 2023
31 May 2023
Hunter Lightman
V. Kosaraju
Yura Burda
Harrison Edwards
Bowen Baker
Teddy Lee
Jan Leike
John Schulman
Ilya Sutskever
K. Cobbe
ALM
OffRL
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (10 upvotes)
Papers citing
"Let's Verify Step by Step"
50 / 1,441 papers shown
RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs
John Dang
Arash Ahmadian
Kelly Marchisio
Julia Kreutzer
Ahmet Üstün
Sara Hooker
247
43
0
02 Jul 2024
DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging
Tzu-Han Lin
Chen-An Li
Hung-yi Lee
Yun-Nung Chen
VLM
ALM
139
6
0
01 Jul 2024
DynaThink: Fast or Slow? A Dynamic Decision-Making Framework for Large Language Models
Jiabao Pan
Yan Zhang
Chen Zhang
Zuozhu Liu
Hongwei Wang
Haizhou Li
LRM
138
15
0
01 Jul 2024
Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning
Zimu Lu
Aojun Zhou
Ke Wang
Houxing Ren
Weikang Shi
Junting Pan
Mingjie Zhan
Hongsheng Li
LRM
290
36
0
30 Jun 2024
Advancing Process Verification for Large Language Models via Tree-Based Preference Learning
Mingqian He
Yongliang Shen
Wenqi Zhang
Zeqi Tan
Weiming Lu
LRM
226
13
0
29 Jun 2024
LiteSearch: Efficacious Tree Search for LLM
Ante Wang
Linfeng Song
Ye Tian
Baolin Peng
Dian Yu
Haitao Mi
Jinsong Su
Dong Yu
246
32
0
29 Jun 2024
Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
Danny Halawi
Alexander Wei
Eric Wallace
Tony T. Wang
Nika Haghtalab
Jacob Steinhardt
SILM
AAML
227
58
0
28 Jun 2024
The SIFo Benchmark: Investigating the Sequential Instruction Following Ability of Large Language Models
Xinyi Chen
Baohao Liao
Jirui Qi
Panagiotis Eustratiadis
Christof Monz
Arianna Bisazza
Maarten de Rijke
ALM
ELM
LRM
222
11
0
28 Jun 2024
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs
Xin Lai
Zhuotao Tian
Yukang Chen
Senqiao Yang
Xiangru Peng
Jiaya Jia
LRM
414
212
0
26 Jun 2024
ARES: Alternating Reinforcement Learning and Supervised Fine-Tuning for Enhanced Multi-Modal Chain-of-Thought Reasoning Through Diverse AI Feedback
Ju-Seung Byun
Jiyun Chun
Jihyung Kil
Andrew Perrault
ReLM
LRM
313
12
0
25 Jun 2024
From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models
Sean Welleck
Amanda Bertsch
Matthew Finlayson
Hailey Schoelkopf
Alex Xie
Graham Neubig
Ilia Kulikov
Zaid Harchaoui
374
110
0
24 Jun 2024
Task Oriented In-Domain Data Augmentation
Xiao Liang
Xinyu Hu
Simiao Zuo
Yeyun Gong
Qiang Lou
Yi Liu
Shao-Lun Huang
Jian Jiao
194
8
0
24 Jun 2024
On the Transformations across Reward Model, Parameter Update, and In-Context Prompt
Deng Cai
Huayang Li
Tingchen Fu
Siheng Li
Weiwen Xu
...
Leyang Cui
Yan Wang
Lemao Liu
Taro Watanabe
Shuming Shi
KELM
232
2
0
24 Jun 2024
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models
Yuxuan Wang
Yueqian Wang
Dongyan Zhao
Cihang Xie
Zilong Zheng
MLLM
VLM
261
53
0
24 Jun 2024
CAVE: Controllable Authorship Verification Explanations
Sahana Ramnath
Kartik Pandey
Elizabeth Boschee
Xiang Ren
388
3
0
24 Jun 2024
Chain-of-Probe: Examining the Necessity and Accuracy of CoT Step-by-Step
Zezhong Wang
Xingshan Zeng
Weiwen Liu
Yufei Wang
Liangyou Li
Yasheng Wang
Lifeng Shang
Xin Jiang
Qun Liu
Kam-Fai Wong
LRM
255
5
0
23 Jun 2024
PORT: Preference Optimization on Reasoning Traces
Salem Lahlou
Abdalgader Abubaker
Hakim Hacid
LRM
331
7
0
23 Jun 2024
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning
Chaojie Wang
Yanchen Deng
Zhiyi Lyu
Liang Zeng
Jujie He
Shuicheng Yan
Bo An
LRM
ReLM
349
95
0
20 Jun 2024
LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback
Bofei Gao
Zefan Cai
Runxin Xu
Peiyi Wang
Ce Zheng
...
Chang Zhou
Wen Xiao
Junjie Hu
Tianyu Liu
Baobao Chang
LRM
341
40
0
20 Jun 2024
Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Haoxiang Wang
Wei Xiong
Tengyang Xie
Han Zhao
Tong Zhang
296
302
0
18 Jun 2024
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
Yuxuan Tong
Xiwen Zhang
Rui Wang
R. Wu
Junxian He
AIMat
LRM
240
81
0
18 Jun 2024
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
Zhen Huang
Zengzhi Wang
Shijie Xia
Xuefeng Li
Haoyang Zou
...
Yuxiang Zheng
Shaoting Zhang
Dahua Lin
Yu Qiao
Pengfei Liu
ELM
LRM
299
71
0
18 Jun 2024
Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner
Kenneth Li
Yiming Wang
Fernanda Viégas
Martin Wattenberg
268
10
0
17 Jun 2024
Nemotron-4 340B Technical Report
Nvidia
:
Bo Adler
Niket Agarwal
Ashwath Aithal
...
Jimmy Zhang
Jing Zhang
Vivienne Zhang
Yian Zhang
Chen Zhu
303
111
0
17 Jun 2024
GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation
Shihao Cai
Keqin Bao
Hangyu Guo
Jizhi Zhang
Jun Song
Bo Zheng
185
28
0
17 Jun 2024
A Survey on Human Preference Learning for Large Language Models
Ruili Jiang
Kehai Chen
Xuefeng Bai
Zhixuan He
Juntao Li
Muyun Yang
Tiejun Zhao
Liqiang Nie
Min Zhang
282
16
0
17 Jun 2024
Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement
Weimin Xiong
Yifan Song
Xiutian Zhao
Wenhao Wu
Xun Wang
Ke Wang
Cheng Li
Wei Peng
Sujian Li
243
64
0
17 Jun 2024
HelpSteer2: Open-source dataset for training top-performing reward models
Zhilin Wang
Yi Dong
Olivier Delalleau
Jiaqi Zeng
Gerald Shen
Daniel Egert
Jimmy J. Zhang
Makesh Narsimhan Sreedhar
Oleksii Kuchaiev
AI4TS
312
163
0
12 Jun 2024
Discovering Preference Optimization Algorithms with and for Large Language Models
Chris Xiaoxuan Lu
Samuel Holt
Claudio Fanconi
Alex J. Chan
Jakob Foerster
M. Schaar
R. T. Lange
OffRL
315
26
0
12 Jun 2024
A Critical Look At Tokenwise Reward-Guided Text Generation
Ahmad Rashid
Ruotian Wu
Julia Grosse
Agustinus Kristiadi
Pascal Poupart
OffRL
601
5
0
12 Jun 2024
TextGrad: Automatic "Differentiation" via Text
Mert Yuksekgonul
Federico Bianchi
Joseph Boen
Sheng Liu
Zhi Huang
Carlos Guestrin
James Zou
LLMAG
OOD
AI4CE
355
95
0
11 Jun 2024
PowerInfer-2: Fast Large Language Model Inference on a Smartphone
Zhenliang Xue
Yixin Song
Zeyu Mi
Le Chen
Yubin Xia
Haibo Chen
326
90
0
10 Jun 2024
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Seungone Kim
Juyoung Suk
Ji Yong Cho
Shayne Longpre
Chaeeun Kim
...
Sean Welleck
Graham Neubig
Moontae Lee
Kyungjae Lee
Minjoon Seo
ELM
ALM
LM&MA
427
71
0
09 Jun 2024
Improve Mathematical Reasoning in Language Models by Automated Process Supervision
Liangchen Luo
Yinxiao Liu
Rosanne Liu
Samrat Phatale
Harsh Lara
...
Lei Shu
Yun Zhu
Lei Meng
Jiao Sun
Abhinav Rastogi
LRM
308
317
0
05 Jun 2024
Exploring Mathematical Extrapolation of Large Language Models with Synthetic Data
Haolong Li
Yu Ma
Yinqi Zhang
Chen Ye
Jie Chen
ReLM
LRM
195
5
0
04 Jun 2024
Process-Driven Autoformalization in Lean 4
Jianqiao Lu
Zhengying Liu
Yingjia Wan
Yinya Huang
Haiming Wang
Zhicheng YANG
Jing Tang
Zhijiang Guo
AI4CE
394
34
0
04 Jun 2024
When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs
Ryo Kamoi
Yusen Zhang
Nan Zhang
Jiawei Han
Rui Zhang
LRM
385
153
0
03 Jun 2024
SemCoder: Training Code Language Models with Comprehensive Semantics
Yangruibo Ding
Jinjun Peng
Marcus J. Min
Gail E. Kaiser
Junfeng Yang
Baishakhi Ray
OffRL
289
35
0
03 Jun 2024
Improving Reward Models with Synthetic Critiques
Zihuiwen Ye
Fraser Greenlee-Scott
Max Bartolo
Phil Blunsom
Jon Ander Campos
Matthias Gallé
ALM
SyDa
LRM
269
37
0
31 May 2024
ANAH: Analytical Annotation of Hallucinations in Large Language Models
Ziwei Ji
Yuzhe Gu
Wenwei Zhang
Chengqi Lyu
Dahua Lin
Kai-xiang Chen
HILM
203
8
0
30 May 2024
Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models
Zhanhui Zhou
Zhixuan Liu
Jie Liu
Zhichen Dong
Chao Yang
Yu Qiao
ALM
296
36
0
29 May 2024
Exploring the LLM Journey from Cognition to Expression with Linear Representations
Yuzi Yan
J. Li
Yipin Zhang
Dong Yan
234
5
0
27 May 2024
MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time
Jikun Kang
Xin Zhe Li
Xi Chen
Amirreza Kazemi
Qianyi Sun
...
Xu He
Quan He
Feng Wen
Jianye Hao
Jun Yao
LRM
ReLM
296
35
0
25 May 2024
Models That Prove Their Own Correctness
Noga Amit
S. Goldwasser
Orr Paradise
G. Rothblum
LRM
452
5
0
24 May 2024
SimPO: Simple Preference Optimization with a Reference-Free Reward
Neural Information Processing Systems (NeurIPS), 2024
Yu Meng
Mengzhou Xia
Danqi Chen
538
785
0
23 May 2024
Calibrated Self-Rewarding Vision Language Models
Neural Information Processing Systems (NeurIPS), 2024
Yiyang Zhou
Zhiyuan Fan
Dongjie Cheng
Sihan Yang
Zhaorun Chen
Chenhang Cui
Xiyao Wang
Yun Li
Linjun Zhang
Huaxiu Yao
VLM
302
65
0
23 May 2024
Tutorly: Turning Programming Videos Into Apprenticeship Learning Environments with LLMs
Wengxi Li
Roy Pea
Nick Haber
Hari Subramonyam
151
7
0
21 May 2024
Hummer: Towards Limited Competitive Preference Dataset
Li Jiang
Yusen Wu
Junwu Xiong
Jingqing Ruan
Yichuan Ding
Qingpei Guo
ZuJie Wen
Jun Zhou
Xiaotie Deng
397
10
0
19 May 2024
Generative Artificial Intelligence: A Systematic Review and Applications
S. S. Sengar
Affan Bin Hasan
Sanjay Kumar
Fiona Carroll
MedIm
298
227
0
17 May 2024
Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
Neural Information Processing Systems (NeurIPS), 2024
Yuexiang Zhai
Hao Bai
Zipeng Lin
Jiayi Pan
Shengbang Tong
...
Alane Suhr
Saining Xie
Yann LeCun
Yi-An Ma
Sergey Levine
LLMAG
LRM
354
132
0
16 May 2024
Previous
1
2
3
...
24
25
26
27
28
29
Next