v1v2 (latest)

MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics

International Conference on Learning Representations (ICLR), 2021

31 August 2021

Papers citing "MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics"

50 / 170 papers shown

Can Language Models Rival Mathematics Students? Evaluating Mathematical Reasoning through Textual Manipulation and Human Experiments

188

16 Dec 2024

Proposing and solving olympiad geometry with guided tree search

264

14 Dec 2024

Formal Theorem Proving by Rewarding LLMs to Decompose Proofs Hierarchically

294

04 Nov 2024

Autoformalize Mathematical Statements by Symbolic Equivalence and Semantic ConsistencyNeural Information Processing Systems (NeurIPS), 2024

Yifan Wu

254

28 Oct 2024

Library Learning Doesn't: The Curious Case of the Single-Use "Library"

Ian Berlot-Attwell

Frank Rudzicz

Xujie Si

178

26 Oct 2024

Alchemy: Amplifying Theorem-Proving Capability through Symbolic MutationInternational Conference on Learning Representations (ICLR), 2024

295

21 Oct 2024

3D-Prover: Diversity Driven Theorem Proving With Determinantal Point Processes

308

14 Oct 2024

Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language ModelsInternational Conference on Learning Representations (ICLR), 2024

Yibo Miao

...

Lei Sha

Yichang Zhang

Xuancheng Ren

Tianyu Liu

Baobao Chang

ELM LRM

280

135

10 Oct 2024

Herald: A Natural Language Annotated Lean 4 DatasetInternational Conference on Learning Representations (ICLR), 2024

Yutong Wang

404

09 Oct 2024

Consistent Autoformalization for Constructing Mathematical LibrariesConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

206

05 Oct 2024

Proof Automation with Large Language Models

170

22 Sep 2024

SubgoalXL: Subgoal-based Expert Learning for Theorem Proving

269

20 Aug 2024

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

Junxiao Song

...

Fuli Luo

288

132

15 Aug 2024

miniCTX: Neural Theorem Proving with (Long-)ContextsInternational Conference on Learning Representations (ICLR), 2024

454

05 Aug 2024

LEAN-GitHub: Compiling GitHub LEAN repositories for a versatile LEAN prover

Zijian Wu

Jiayu Wang

Dahua Lin

Kai-xiang Chen

295

24 Jul 2024

PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition

George Tsoukalas

Jimmy Xin

Swarat Chaudhuri

254

15 Jul 2024

Lean-STaR: Learning to Interleave Thinking and Proving

683

14 Jul 2024

Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist

Xiaowei Huang

Qiufeng Wang

Kaizhu Huang

ELM LRM

254

11 Jul 2024

Towards Automated Functional Equation Proving: A Benchmark Dataset and A Domain-Specific In-Context Agent

Mahdi Buali

Robert Hoehndorf

242

05 Jul 2024

TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts

Jipeng Zhang

Tong Zhang

284

03 Jul 2024

Learning Formal Mathematics From Intrinsic Motivation

309

30 Jun 2024

FVEL: Interactive Formal Verification Environment with Large Language Models via Theorem Proving

Zhengying Liu

164

20 Jun 2024

Proving Olympiad Algebraic Inequalities without Human Demonstrations

238

20 Jun 2024

DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving

240

18 Jun 2024

miniCodeProps: a Minimal Benchmark for Proving Code Properties

Evan Lohn

Sean Welleck

181

16 Jun 2024

Reliable Evaluation and Benchmarks for Statement Autoformalization

412

11 Jun 2024

Lean Workbook: A large-scale Lean problem set formalized from natural language math problems

501

06 Jun 2024

Process-Driven Autoformalization in Lean 4

394

04 Jun 2024

Proving Theorems RecursivelyNeural Information Processing Systems (NeurIPS), 2024

Zhengying Liu

...

Xiaodan Liang

218

23 May 2024

DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

Zhihong Shao

Bo Liu

Xiaodan Liang

286

151

23 May 2024

Lean Copilot: Large Language Models as Copilots for Theorem Proving in Lean

Peiyang Song

Kaiyu Yang

A. Anandkumar

372

18 Apr 2024

A Survey on Deep Learning for Theorem Proving

286

15 Apr 2024

Don't Trust: Verify -- Grounding LLM Quantitative Reasoning with Autoformalization

190

26 Mar 2024

BAIT: Benchmarking (Embedding) Architectures for Interactive Theorem-Proving

279

06 Mar 2024

GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers

Qintong Li

Leyang Cui

Xueliang Zhao

Lingpeng Kong

Wei Bi

LRM

334

107

29 Feb 2024

Measuring Vision-Language STEM Skills of Neural Models

426

27 Feb 2024

OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems

...

Yuxiang Zhang

Jie Liu

Lei Qi

Zhiyuan Liu

Maosong Sun

ELM AIMat

401

677

21 Feb 2024

MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data

Zhengying Liu

Linqi Song

Xiaodan Liang

ALM

374

14 Feb 2024

InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning

Zhejian Zhou

...

Xipeng Qiu

Dahua Lin

235

114

09 Feb 2024

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Zhihong Shao

Peiyi Wang

Runxin Xu

...

1.5K

3,856

05 Feb 2024

Large Language Models for Mathematical Reasoning: Progresses and Challenges

361

267

31 Jan 2024

Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided InterventionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

360

17 Jan 2024

Enhancing Neural Theorem Proving through Data Augmentation and Dynamic Sampling Method

Rahul Vishwakarma

Subhankar Mishra

AIMat

265

20 Dec 2023

Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent

195

14 Dec 2023

Large Language Models' Understanding of Math: Source Criticism and Extrapolation

Roozbeh Yousefzadeh

Xuenan Cao

ELM LRM

127

12 Nov 2023

FormalGeo: An Extensible Formalized Framework for Olympiad Geometric Problem Solving

...

390

27 Oct 2023

Llemma: An Open Language Model For MathematicsInternational Conference on Learning Representations (ICLR), 2023

Albert Q. Jiang

328

386

16 Oct 2023

TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

...

Xiaodan Liang

Qun Liu

203

16 Oct 2023

A New Approach Towards Autoformalization

370

12 Oct 2023

An In-Context Learning Agent for Formal Theorem-Proving

Amitayush Thakur

George Tsoukalas

Yeming Wen

Jimmy Xin

Swarat Chaudhuri

LLMAG

271

06 Oct 2023