ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.20050
  4. Cited By
Let's Verify Step by Step

Let's Verify Step by Step

International Conference on Learning Representations (ICLR), 2023
31 May 2023
Hunter Lightman
V. Kosaraju
Yura Burda
Harrison Edwards
Bowen Baker
Teddy Lee
Jan Leike
John Schulman
Ilya Sutskever
K. Cobbe
    ALMOffRLLRM
ArXiv (abs)PDFHTMLHuggingFace (10 upvotes)

Papers citing "Let's Verify Step by Step"

50 / 1,441 papers shown
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level
  Mathematical Reasoning
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning
Di Zhang
Jianbo Wu
Jingdi Lei
Tong Che
Jiatong Li
...
Shufei Zhang
Marco Pavone
Yuqiang Li
Wanli Ouyang
Dongzhan Zhou
LRM
260
89
0
03 Oct 2024
GraphIC: A Graph-Based In-Context Example Retrieval Model for Multi-Step Reasoning
GraphIC: A Graph-Based In-Context Example Retrieval Model for Multi-Step Reasoning
Jiale Fu
Yaqing Wang
Simeng Han
Jiaming Fan
Chen Si
490
1
0
03 Oct 2024
ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement
ReGenesis: LLMs can Grow into Reasoning Generalists via Self-ImprovementInternational Conference on Learning Representations (ICLR), 2024
Xiangyu Peng
Congying Xia
Xinyi Yang
Caiming Xiong
Chien-Sheng Wu
Chen Xing
LRM
321
14
0
03 Oct 2024
CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning
CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning
Huimu Yu
Xing Wu
Weidong Yin
Debing Zhang
Songlin Hu
LRM
314
7
0
03 Oct 2024
Learning to Better Search with Language Models via Guided Reinforced Self-Training
Learning to Better Search with Language Models via Guided Reinforced Self-Training
Seungyong Moon
Bumsoo Park
Hyun Oh Song
AIFinRALM
281
4
0
03 Oct 2024
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization AnalysisInternational Conference on Learning Representations (ICLR), 2024
Hongkang Li
Songtao Lu
Pin-Yu Chen
Xiaodong Cui
Meng Wang
LRM
492
11
0
03 Oct 2024
Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment
Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment
Yifan Zhang
Ge Zhang
Yue Wu
Kangping Xu
Quanquan Gu
484
3
0
03 Oct 2024
Evaluating Robustness of Reward Models for Mathematical Reasoning
Evaluating Robustness of Reward Models for Mathematical Reasoning
Sunghwan Kim
Dongjin Kang
Taeyoon Kwon
Hyungjoo Chae
Jungsoo Won
Dongha Lee
Jinyoung Yeo
199
15
0
02 Oct 2024
Can We Further Elicit Reasoning in LLMs? Critic-Guided Planning with
  Retrieval-Augmentation for Solving Challenging Tasks
Can We Further Elicit Reasoning in LLMs? Critic-Guided Planning with Retrieval-Augmentation for Solving Challenging TasksAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Xingxuan Li
Weiwen Xu
Ruochen Zhao
Fangkai Jiao
Shafiq Joty
Lidong Bing
LRM
264
24
0
02 Oct 2024
Closed-Loop Long-Horizon Robotic Planning via Equilibrium Sequence Modeling
Closed-Loop Long-Horizon Robotic Planning via Equilibrium Sequence Modeling
Jinghan Li
Zhicheng Sun
Fei Li
774
2
0
02 Oct 2024
TypedThinker: Diversify Large Language Model Reasoning with Typed Thinking
TypedThinker: Diversify Large Language Model Reasoning with Typed ThinkingInternational Conference on Learning Representations (ICLR), 2024
Danqing Wang
Jianxin Ma
Fei Fang
Lei Li
LLMAGLRM
904
2
0
02 Oct 2024
LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits
LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits
Duy Nguyen
Archiki Prasad
Elias Stengel-Eskin
Joey Tianyi Zhou
456
5
0
02 Oct 2024
Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo
Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte CarloInternational Conference on Learning Representations (ICLR), 2024
Shengyu Feng
Xiang Kong
Shuang Ma
Aonan Zhang
Dong Yin
Chong-Jun Wang
Ruoming Pang
Yiming Yang
LRM
442
7
0
02 Oct 2024
RATIONALYST: Mining Implicit Rationales for Process Supervision of Reasoning
RATIONALYST: Mining Implicit Rationales for Process Supervision of Reasoning
Dongwei Jiang
Guoxuan Wang
Yining Lu
Andrew Wang
Jingyu Zhang
Chuyu Liu
Benjamin Van Durme
Daniel Khashabi
LRMReLM
215
3
0
01 Oct 2024
Inference-Time Language Model Alignment via Integrated Value Guidance
Inference-Time Language Model Alignment via Integrated Value GuidanceConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Zhixuan Liu
Zhanhui Zhou
Yuanfu Wang
Chao Yang
Yu Qiao
170
15
0
26 Sep 2024
Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models
Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models
Tongxuan Liu
Wenjiang Xu
Weizhe Huang
Yuting Zeng
Jiaxing Wang
Hailong Yang
Hailong Yang
Jing Li
LRMReLM
313
22
0
26 Sep 2024
Direct Judgement Preference Optimization
Direct Judgement Preference Optimization
Peifeng Wang
Austin Xu
Yilun Zhou
Caiming Xiong
Shafiq Joty
ELM
375
23
0
23 Sep 2024
GroupDebate: Enhancing the Efficiency of Multi-Agent Debate Using Group Discussion
GroupDebate: Enhancing the Efficiency of Multi-Agent Debate Using Group Discussion
Tongxuan Liu
Xingyu Wang
Weizhe Huang
Wenjiang Xu
Yuting Zeng
Lei Jiang
Hailong Yang
Jing Li
LLMAG
364
40
0
21 Sep 2024
System 2 thinking in OpenAI's o1-preview model: Near-perfect performance
  on a mathematics exam
System 2 thinking in OpenAI's o1-preview model: Near-perfect performance on a mathematics examDe Computis (DC), 2024
J. D. Winter
Dimitra Dodou
Y. B. Eisma
VLMELMLRMReLM
314
20
0
19 Sep 2024
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced
  Mathematical Reasoning
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning
Xiaotian Han
Yiren Jian
Xuefeng Hu
Haogeng Liu
Yiqi Wang
...
Yuang Ai
Huaibo Huang
Ran He
Zhenheng Yang
Quanzeng You
LRMAI4CE
206
32
0
19 Sep 2024
LogicPro: Improving Complex Logical Reasoning via Program-Guided Learning
LogicPro: Improving Complex Logical Reasoning via Program-Guided LearningAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Jin Jiang
Yuchen Yan
Yang Liu
Yonggang Jin
Shuai Peng
Hao Fei
Xunliang Cai
Yixin Cao
Liangcai Gao
LRM
486
10
0
19 Sep 2024
CraftRTL: High-quality Synthetic Data Generation for Verilog Code Models with Correct-by-Construction Non-Textual Representations and Targeted Code Repair
CraftRTL: High-quality Synthetic Data Generation for Verilog Code Models with Correct-by-Construction Non-Textual Representations and Targeted Code RepairInternational Conference on Learning Representations (ICLR), 2024
Mingjie Liu
Yun-Da Tsai
Wenfei Zhou
Haoxing Ren
SyDa3DV
388
42
0
19 Sep 2024
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoningInternational Conference on Learning Representations (ICLR), 2024
Zayne Sprague
Fangcong Yin
Juan Diego Rodriguez
Dongwei Jiang
Manya Wadhwa
Prasann Singhal
Xinyu Zhao
Xi Ye
Kyle Mahowald
Greg Durrett
ReLMLRM
667
238
0
18 Sep 2024
MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning
MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning
Justin Chih-Yao Chen
Archiki Prasad
Swarnadeep Saha
Elias Stengel-Eskin
Joey Tianyi Zhou
LRM
546
31
0
18 Sep 2024
OmniGen: Unified Image Generation
OmniGen: Unified Image GenerationComputer Vision and Pattern Recognition (CVPR), 2024
Shitao Xiao
Yueze Wang
Yueze Wang
Huaying Yuan
Xingrun Xing
Ruiran Yan
Shuting Wang
Tiejun Huang
Zheng Liu
DiffMVLMSyDa
446
255
0
17 Sep 2024
Quantile Regression for Distributional Reward Models in RLHF
Quantile Regression for Distributional Reward Models in RLHF
Nicolai Dorka
297
45
0
16 Sep 2024
MathGLM-Vision: Solving Mathematical Problems with Multi-Modal Large
  Language Model
MathGLM-Vision: Solving Mathematical Problems with Multi-Modal Large Language Model
Zhen Yang
Jinhao Chen
Zhengxiao Du
Wenmeng Yu
Weihan Wang
Wenyi Hong
Zhihuan Jiang
Bin Xu
Yuxiao Dong
Jie Tang
VLMLRM
196
15
0
10 Sep 2024
Programming Refusal with Conditional Activation Steering
Programming Refusal with Conditional Activation SteeringInternational Conference on Learning Representations (ICLR), 2024
Bruce W. Lee
Inkit Padhi
Karthikeyan N. Ramamurthy
Erik Miehling
Pierre Dognin
Manish Nagireddy
Amit Dhurandhar
LLMSV
502
73
0
06 Sep 2024
Towards a Unified View of Preference Learning for Large Language Models:
  A Survey
Towards a Unified View of Preference Learning for Large Language Models: A Survey
Bofei Gao
Feifan Song
Yibo Miao
Zefan Cai
Zhiyong Yang
...
Houfeng Wang
Zhifang Sui
Peiyi Wang
Baobao Chang
Baobao Chang
466
17
0
04 Sep 2024
Compositional 3D-aware Video Generation with LLM Director
Compositional 3D-aware Video Generation with LLM DirectorNeural Information Processing Systems (NeurIPS), 2024
Hanxin Zhu
Tianyu He
Anni Tang
Junliang Guo
Zhibo Chen
Jiang Bian
DiffMVGen
208
12
0
31 Aug 2024
Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal
  Sampling
Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal SamplingInternational Conference on Learning Representations (ICLR), 2024
Hritik Bansal
Arian Hosseini
Rishabh Agarwal
Vinh Q. Tran
Mehran Kazemi
SyDaOffRLLRM
286
64
0
29 Aug 2024
Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts Critic
Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts CriticAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Xin Zheng
Jie Lou
Boxi Cao
Xueru Wen
Yuqiu Ji
Hongyu Lin
Yaojie Lu
Xianpei Han
Debing Zhang
Le Sun
OffRLLRMLLMAGReLMKELM
549
24
1
29 Aug 2024
ConsistencyTrack: A Robust Multi-Object Tracker with a Generation
  Strategy of Consistency Model
ConsistencyTrack: A Robust Multi-Object Tracker with a Generation Strategy of Consistency Model
Lifan Jiang
Zhihui Wang
Siqi Yin
Guangxiao Ma
Peng Zhang
Boxi Wu
DiffM
348
19
0
28 Aug 2024
Large Language Models Are Self-Taught Reasoners: Enhancing LLM
  Applications via Tailored Problem-Solving Demonstrations
Large Language Models Are Self-Taught Reasoners: Enhancing LLM Applications via Tailored Problem-Solving Demonstrations
Kai Tzu-iunn Ong
Taeyoon Kwon
Jinyoung Yeo
LRM
132
1
0
22 Aug 2024
Visual Agents as Fast and Slow Thinkers
Visual Agents as Fast and Slow ThinkersInternational Conference on Learning Representations (ICLR), 2024
Guangyan Sun
Haoyang Ling
Zhenting Wang
Cheng-Long Wang
Siqi Ma
Qifan Wang
Ying Nian Wu
Ying Nian Wu
Dongfang Liu
Dongfang Liu
LLMAGLRM
551
46
0
16 Aug 2024
Problem Solving Through Human-AI Preference-Based Cooperation
Problem Solving Through Human-AI Preference-Based CooperationComputational Linguistics (CL), 2024
Subhabrata Dutta
Timo Kaufmann
Goran Glavaš
Ivan Habernal
Kristian Kersting
Frauke Kreuter
Mira Mezini
Iryna Gurevych
Eyke Hüllermeier
Hinrich Schuetze
873
7
0
14 Aug 2024
Can Large Language Models Reason? A Characterization via 3-SAT
Can Large Language Models Reason? A Characterization via 3-SAT
Rishi Hazra
Gabriele Venturato
Pedro Zuidberg Dos Martires
Luc de Raedt
ELMReLMLRM
249
16
0
13 Aug 2024
Speculations on Uncertainty and Humane Algorithms
Speculations on Uncertainty and Humane Algorithms
Nicholas Gray
222
1
0
13 Aug 2024
Re-TASK: Revisiting LLM Tasks from Capability, Skill, and Knowledge Perspectives
Re-TASK: Revisiting LLM Tasks from Capability, Skill, and Knowledge PerspectivesAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Zhihu Wang
Shiwan Zhao
Yu Wang
Heyuan Huang
Sitao Xie
Xicheng Zhang
Jiaxin Shi
Zhixing Wang
Xue Yang
Junchi Yan
LRM
380
13
0
13 Aug 2024
Semantic Skill Grounding for Embodied Instruction-Following in
  Cross-Domain Environments
Semantic Skill Grounding for Embodied Instruction-Following in Cross-Domain EnvironmentsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Sangwoo Shin
Takehiro Matsuoka
Youngsoo Jang
Moontae Lee
Kazuya Yoshida
415
0
0
02 Aug 2024
ThinK: Thinner Key Cache by Query-Driven Pruning
ThinK: Thinner Key Cache by Query-Driven PruningInternational Conference on Learning Representations (ICLR), 2024
Yuhui Xu
Zhanming Jie
Hanze Dong
Lei Wang
Xudong Lu
Aojun Zhou
Amrita Saha
Caiming Xiong
Doyen Sahoo
536
42
0
30 Jul 2024
Meta-Rewarding Language Models: Self-Improving Alignment with
  LLM-as-a-Meta-Judge
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
Tianhao Wu
Weizhe Yuan
O. Yu. Golovneva
Jing Xu
Yuandong Tian
Jiantao Jiao
Jason Weston
Sainbayar Sukhbaatar
ALMKELMLRM
373
156
0
28 Jul 2024
Prover-Verifier Games improve legibility of LLM outputs
Prover-Verifier Games improve legibility of LLM outputs
Jan Hendrik Kirchner
Yining Chen
Harri Edwards
Jan Leike
Nat McAleese
Yuri Burda
LRMAAML
282
51
0
18 Jul 2024
Weak-to-Strong Reasoning
Weak-to-Strong Reasoning
Yuqing Yang
Yan Ma
Pengfei Liu
LRM
330
28
0
18 Jul 2024
Questionable practices in machine learning
Questionable practices in machine learning
Gavin Leech
Juan J. Vazquez
Misha Yagudin
Niclas Kupper
Laurence Aitchison
277
6
0
17 Jul 2024
Reflective Instruction Tuning: Mitigating Hallucinations in Large
  Vision-Language Models
Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models
Jinrui Zhang
Teng Wang
Haigang Zhang
Ping Lu
Feng Zheng
MLLMLRMVLM
307
10
0
16 Jul 2024
Token-Supervised Value Models for Enhancing Mathematical Problem-Solving Capabilities of Large Language Models
Token-Supervised Value Models for Enhancing Mathematical Problem-Solving Capabilities of Large Language Models
Jung Hyun Lee
June Yong Yang
Byeongho Heo
Dongyoon Han
Kang Min Yoo
Eunho Yang
Kang Min Yoo
LRM
108
1
0
12 Jul 2024
Self-training Language Models for Arithmetic Reasoning
Self-training Language Models for Arithmetic Reasoning
Marek Kadlcík
Michal Štefánik
KELMReLMOffRLLRM
166
1
0
11 Jul 2024
DotaMath: Decomposition of Thought with Code Assistance and
  Self-correction for Mathematical Reasoning
DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning
Chengpeng Li
Guanting Dong
Mingfeng Xue
Ru Peng
Xiang Wang
Dayiheng Liu
LRMReLM
349
25
0
04 Jul 2024
52B to 1T: Lessons Learned via Tele-FLM Series
52B to 1T: Lessons Learned via Tele-FLM Series
Xiang Li
Yiqun Yao
Xin Jiang
Xuezhi Fang
Chao Wang
...
Yequan Wang
Zhongjiang He
Zhongyuan Wang
Xuelong Li
Tiejun Huang
ALMLRM
201
9
0
03 Jul 2024
Previous
123...232425...272829
Next
Page 24 of 29
Pageof 29