Let's Verify Step by Step

International Conference on Learning Representations (ICLR), 2023

31 May 2023

ArXiv (abs)PDF HTML HuggingFace (10 upvotes)

Papers citing "Let's Verify Step by Step"

50 / 1,441 papers shown

Token-level Proximal Policy Optimization for Query Generation

Chenghua Huang

...

897

01 Nov 2024

RL-STaR: Theoretical Analysis of Reinforcement Learning Frameworks for Self-Taught Reasoner

993

31 Oct 2024

Smaller Large Language Models Can Do Moral Self-Correction

Kristen Marie Johnson

LRM

357

30 Oct 2024

Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning

Yihe Deng

Paul Mineiro

LRM

213

29 Oct 2024

AutoGLM: Autonomous Foundation Agents for GUIs

Xiao Liu

...

Yujia Wang

313

28 Oct 2024

HoPE: A Novel Positional Encoding Without Long-Term Decay for Enhanced Context Awareness and ExtrapolationAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Yuhan Chen

Ang Lv

Jian Luan

Bin Wang

Wen Liu

227

28 Oct 2024

Matryoshka Pilot: Learning to Drive Black-Box LLMs with LLMs

325

28 Oct 2024

Guiding Through Complexity: What Makes Good Supervision for Hard Math Reasoning Tasks?

265

27 Oct 2024

GFlowNet Fine-tuning for Diverse Correct Solutions in Mathematical Reasoning Tasks

193

26 Oct 2024

GHIL-Glue: Hierarchical Control with Filtered Subgoal ImagesIEEE International Conference on Robotics and Automation (ICRA), 2024

...

312

26 Oct 2024

Cooperative Strategic Planning Enhances Reasoning Capabilities in Large Language Models

200

25 Oct 2024

2D-DPO: Scaling Direct Preference Optimization with 2-Dimensional Supervision

Yancheng He

Bo Zheng

227

25 Oct 2024

Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks

Graziano A. Manduzio

Federico A. Galatolo

M. G. Cimino

Enzo Pasquale Scilingo

Lorenzo Cominelli

LRM

192

24 Oct 2024

Hybrid Preferences: Learning to Route Instances for Human vs. AI FeedbackAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Lester James V. Miranda

457

24 Oct 2024

Process Supervision-Guided Policy Optimization for Code Generation

Liang Huang

263

23 Oct 2024

RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and StyleInternational Conference on Learning Representations (ICLR), 2024

Juanzi Li

345

100

21 Oct 2024

On Designing Effective RL Reward at Training Time for LLM Reasoning

Yi Wu

549

19 Oct 2024

Step Guided Reasoning: Improving Mathematical Reasoning using Guidance Generation and Step Reasoning

373

18 Oct 2024

Think Thrice Before You Act: Progressive Thought Refinement in Large Language Models

...

245

17 Oct 2024

MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison FeedbackNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

429

17 Oct 2024

Enhancing Mathematical Reasoning in LLMs by Stepwise CorrectionAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Zhaoxuan Tan

264

16 Oct 2024

A Survey on Data Synthesis and Augmentation for Large Language Models

...

425

16 Oct 2024

Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math Reasoning

Vernon Y.H. Toh

Deepanway Ghosal

Soujanya Poria

LRM

182

16 Oct 2024

MSc-SQL: Multi-Sample Critiquing Small Language Models For Text-To-SQL TranslationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

427

16 Oct 2024

JudgeBench: A Benchmark for Evaluating LLM-based JudgesInternational Conference on Learning Representations (ICLR), 2024

Ion Stoica

722

149

16 Oct 2024

Process Reward Model with Q-Value RankingsInternational Conference on Learning Representations (ICLR), 2024

W. Li

Yixuan Li

LRM

631

15 Oct 2024

Agent-as-a-Judge: Evaluate Agents with Agents

Wenyi Wang

...

Raghuraman Krishnamoorthi

411

112

14 Oct 2024

Overcoming classic challenges for artificial neural networks by providing incentives and practiceNature Machine Intelligence (Nat. Mach. Intell.), 2024

Kazuki Irie

Brenden M. Lake

585

14 Oct 2024

Innovative Thinking, Infinite Humor: Humor Research of Large Language Models through Structured Thought LeapsInternational Conference on Learning Representations (ICLR), 2024

Xiaohan Wang

459

14 Oct 2024

CoMAT: Chain of Mathematically Annotated Thought Improves Mathematical Reasoning

412

14 Oct 2024

Language Model Embeddings Can Be Sufficient for Bayesian Optimization

366

14 Oct 2024

COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement

901

12 Oct 2024

OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models

Jun Wang

...

232

12 Oct 2024

Boosting Deductive Reasoning with Step Signals In RLHF

Yipin Zhang

156

12 Oct 2024

Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization

Ning Dai

Quanquan Gu

Lin Yan

OffRL LRM

376

11 Oct 2024

Rewarding Progress: Scaling Automated Process Verifiers for LLM ReasoningInternational Conference on Learning Representations (ICLR), 2024

Amrith Rajagopal Setlur

Rishabh Agarwal

Aviral Kumar

401

166

10 Oct 2024

Efficient Reinforcement Learning with Large Language Model Priors

Xue Yan

Yan Song

Xidong Feng

Mengyue Yang

Haifeng Zhang

Haitham Bou Ammar

Jun Wang

OffRL

213

10 Oct 2024

Automatic Curriculum Expert Iteration for Reliable LLM ReasoningInternational Conference on Learning Representations (ICLR), 2024

Hanze Dong

Caiming Xiong

364

10 Oct 2024

Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning

Xiyao Wang

Furong Huang

303

09 Oct 2024

Subtle Errors in Reasoning: Preference Learning via Error-injected Self-editingAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

1.0K

09 Oct 2024

O1 Replication Journey: A Strategic Progress Report -- Part 1

...

364

137

08 Oct 2024

On the Modeling Capabilities of Large Language Models for Sequential Decision MakingInternational Conference on Learning Representations (ICLR), 2024

309

08 Oct 2024

FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning

256

08 Oct 2024

Rationale-Aware Answer Verification by Pairwise Self-EvaluationConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Akira Kawabata

Saku Sugawara

LRM

346

07 Oct 2024

Reasoning Paths Optimization: Learning to Reason and Explore From Diverse PathsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

241

07 Oct 2024

From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample EfficiencyInternational Conference on Learning Representations (ICLR), 2024

Jingzhao Zhang

573

07 Oct 2024

Active Fine-Tuning of Multi-Task Policies

566

07 Oct 2024

Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification

Ye Liu

Yingbo Zhou

267

05 Oct 2024

Misinformation with Legal Consequences (MisLC): A New Task Towards Harnessing Societal Harm of MisinformationConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Xiaodan Zhu

215

04 Oct 2024

System 2 Reasoning Capabilities Are Nigh

Scott C. Lowe

VLM LRM

203

04 Oct 2024