Let's Verify Step by Step

International Conference on Learning Representations (ICLR), 2023

31 May 2023

ArXiv (abs)PDF HTML HuggingFace (10 upvotes)

Papers citing "Let's Verify Step by Step"

50 / 1,447 papers shown

Reasoning Paths Optimization: Learning to Reason and Explore From Diverse PathsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

253

07 Oct 2024

From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample EfficiencyInternational Conference on Learning Representations (ICLR), 2024

Jingzhao Zhang

630

07 Oct 2024

Active Fine-Tuning of Multi-Task Policies

584

07 Oct 2024

Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification

Ye Liu

Yingbo Zhou

278

05 Oct 2024

Misinformation with Legal Consequences (MisLC): A New Task Towards Harnessing Societal Harm of MisinformationConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Xiaodan Zhu

242

04 Oct 2024

System 2 Reasoning Capabilities Are Nigh

Scott C. Lowe

VLM LRM

204

04 Oct 2024

LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning

...

Marco Pavone

Yuqiang Li

Wanli Ouyang

Dongzhan Zhou

LRM

272

03 Oct 2024

Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment

503

03 Oct 2024

Learning to Better Search with Language Models via Guided Reinforced Self-Training

313

03 Oct 2024

CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning

347

03 Oct 2024

ReGenesis: LLMs can Grow into Reasoning Generalists via Self-ImprovementInternational Conference on Learning Representations (ICLR), 2024

Xiangyu Peng

Congying Xia

Xinyi Yang

Caiming Xiong

Chien-Sheng Wu

Chen Xing

LRM

375

03 Oct 2024

GraphIC: A Graph-Based In-Context Example Retrieval Model for Multi-Step Reasoning

530

03 Oct 2024

Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization AnalysisInternational Conference on Learning Representations (ICLR), 2024

561

03 Oct 2024

Evaluating Robustness of Reward Models for Mathematical Reasoning

Sunghwan Kim

Jinyoung Yeo

220

02 Oct 2024

Can We Further Elicit Reasoning in LLMs? Critic-Guided Planning with Retrieval-Augmentation for Solving Challenging TasksAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

270

02 Oct 2024

Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte CarloInternational Conference on Learning Representations (ICLR), 2024

Xiang Kong

Aonan Zhang

Yiming Yang

483

02 Oct 2024

LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits

553

02 Oct 2024

TypedThinker: Diversify Large Language Model Reasoning with Typed ThinkingInternational Conference on Learning Representations (ICLR), 2024

Danqing Wang

930

02 Oct 2024

Closed-Loop Long-Horizon Robotic Planning via Equilibrium Sequence Modeling

Jinghan Li

Zhicheng Sun

Fei Li

856

02 Oct 2024

RATIONALYST: Mining Implicit Rationales for Process Supervision of Reasoning

224

01 Oct 2024

Inference-Time Language Model Alignment via Integrated Value GuidanceConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Zhixuan Liu

Zhanhui Zhou

Yuanfu Wang

Chao Yang

Yu Qiao

188

26 Sep 2024

Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models

Tongxuan Liu

Wenjiang Xu

Weizhe Huang

Yuting Zeng

Jiaxing Wang

Hailong Yang

Jing Li

LRM ReLM

337

26 Sep 2024

Direct Judgement Preference Optimization

396

23 Sep 2024

GroupDebate: Enhancing the Efficiency of Multi-Agent Debate Using Group Discussion

Tongxuan Liu

Xingyu Wang

Weizhe Huang

Wenjiang Xu

Jing Li

394

21 Sep 2024

System 2 thinking in OpenAI's o1-preview model: Near-perfect performance on a mathematics examDe Computis (DC), 2024

355

19 Sep 2024

InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning

Xiaotian Han

Yiren Jian

Xuefeng Hu

Haogeng Liu

Yiqi Wang

...

Yuang Ai

Huaibo Huang

Ran He

Zhenheng Yang

Quanzeng You

LRM AI4CE

224

19 Sep 2024

LogicPro: Improving Complex Logical Reasoning via Program-Guided LearningAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

499

19 Sep 2024

CraftRTL: High-quality Synthetic Data Generation for Verilog Code Models with Correct-by-Construction Non-Textual Representations and Targeted Code RepairInternational Conference on Learning Representations (ICLR), 2024

412

19 Sep 2024

MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning

603

18 Sep 2024

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoningInternational Conference on Learning Representations (ICLR), 2024

738

265

18 Sep 2024

OmniGen: Unified Image GenerationComputer Vision and Pattern Recognition (CVPR), 2024

Shitao Xiao

Yueze Wang

Zheng Liu

496

299

17 Sep 2024

Quantile Regression for Distributional Reward Models in RLHF

Nicolai Dorka

359

16 Sep 2024

MathGLM-Vision: Solving Mathematical Problems with Multi-Modal Large Language Model

Zhen Yang

Jinhao Chen

Bin Xu

Yuxiao Dong

Jie Tang

VLM LRM

213

10 Sep 2024

Programming Refusal with Conditional Activation SteeringInternational Conference on Learning Representations (ICLR), 2024

Bruce W. Lee

Inkit Padhi

Karthikeyan N. Ramamurthy

535

06 Sep 2024

Towards a Unified View of Preference Learning for Large Language Models: A Survey

...

Houfeng Wang

Zhifang Sui

Peiyi Wang

Baobao Chang

498

04 Sep 2024

Compositional 3D-aware Video Generation with LLM DirectorNeural Information Processing Systems (NeurIPS), 2024

Zhibo Chen

219

31 Aug 2024

Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal SamplingInternational Conference on Learning Representations (ICLR), 2024

Rishabh Agarwal

304

29 Aug 2024

Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts CriticAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

OffRL LRM LLMAG ReLM KELM

578

29 Aug 2024

ConsistencyTrack: A Robust Multi-Object Tracker with a Generation Strategy of Consistency Model

Lifan Jiang

Zhihui Wang

Siqi Yin

Guangxiao Ma

Peng Zhang

Boxi Wu

DiffM

360

28 Aug 2024

Large Language Models Are Self-Taught Reasoners: Enhancing LLM Applications via Tailored Problem-Solving Demonstrations

Kai Tzu-iunn Ong

Taeyoon Kwon

Jinyoung Yeo

LRM

140

22 Aug 2024

Visual Agents as Fast and Slow ThinkersInternational Conference on Learning Representations (ICLR), 2024

Zhenting Wang

579

16 Aug 2024

Problem Solving Through Human-AI Preference-Based CooperationComputational Linguistics (CL), 2024

995

14 Aug 2024

Can Large Language Models Reason? A Characterization via 3-SAT

Rishi Hazra

Gabriele Venturato

Pedro Zuidberg Dos Martires

Luc de Raedt

ELM ReLM LRM

260

13 Aug 2024

Speculations on Uncertainty and Humane Algorithms

Nicholas Gray

229

13 Aug 2024

Re-TASK: Revisiting LLM Tasks from Capability, Skill, and Knowledge PerspectivesAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

428

13 Aug 2024

Semantic Skill Grounding for Embodied Instruction-Following in Cross-Domain EnvironmentsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

446

02 Aug 2024

ThinK: Thinner Key Cache by Query-Driven PruningInternational Conference on Learning Representations (ICLR), 2024

626

30 Jul 2024

Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge

Jason Weston

435

171

28 Jul 2024

Prover-Verifier Games improve legibility of LLM outputs

332

18 Jul 2024

Weak-to-Strong Reasoning

402

18 Jul 2024