v1v2 (latest)

MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics

International Conference on Learning Representations (ICLR), 2021

31 August 2021

Papers citing "MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics"

50 / 170 papers shown

IndiMathBench: Autoformalizing Mathematical Reasoning Problems with a Human Touch

295

30 Nov 2025

Gold-Medal-Level Olympiad Geometry Solving with Efficient Heuristic Auxiliary Constructions

27 Nov 2025

Spark-Prover-X1: Formal Theorem Proving Through Diverse Data Training

394

17 Nov 2025

Improving Autoformalization Using Direct Dependency Retrieval

Shaoqi Wang

Lu Yu

Chunjie Yang

Feng Yan

Chunjie Yang

Qing Cui

Jun Zhou

145

15 Nov 2025

Towards Autoformalization of LLM-generated Outputs for Requirement Verification

Mihir Gupte

Ramesh S

14 Nov 2025

miniF2F-Lean Revisited: Reviewing Limitations and Charting a Path Forward

Azim Ospanov

Farzan Farnia

Roozbeh Yousefzadeh

138

05 Nov 2025

The ORCA Benchmark: Evaluating Real-World Calculation Accuracy in Large Language Models

Joanna Śmietańska-Nowak

ELM ALM LRM

397

04 Nov 2025

FATE: A Formal Benchmark Series for Frontier Algebra of Multiple Difficulty Levels

...

413

04 Nov 2025

ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization

337

28 Oct 2025

ProofBridge: Auto-Formalization of Natural Language Proofs in Lean via Joint Embeddings

424

17 Oct 2025

Max It or Miss It: Benchmarking LLM On Solving Extremal Problems

Binxin Gao

Jingjun Han

ELM LRM

210

14 Oct 2025

Ax-Prover: A Deep Reasoning Agentic Framework for Theorem Proving in Mathematics and Quantum Physics

Marco Del Tredici

Jacob McCarran

Benjamin Breen

Javier Aspuru Mijares

246

14 Oct 2025

TopoAlign: A Framework for Aligning Code to Math via Topological Decomposition

Yupei Li

Philipp Borchert

Gerasimos Lampouras

110

13 Oct 2025

GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving

13 Oct 2025

MASA: LLM-Driven Multi-Agent Systems for Autoformalization

101

10 Oct 2025

RefGrader: Automated Grading of Mathematical Competition Proofs using Agentic Workflows

Niloofar Mireshghallah

V. Honavar

157

10 Oct 2025

PRISM-Physics: Causal DAG-Based Process Evaluation for Physics Reasoning

139

03 Oct 2025

RAPID: An Efficient Reinforcement Learning Algorithm for Small Language Models

167

03 Oct 2025

Aristotle: IMO-level Automated Theorem Proving

...

168

01 Oct 2025

Benchmarking Foundation Models with Retrieval-Augmented Generation in Olympic-Level Physics Problem Solving

242

01 Oct 2025

Atomic Thinking of LLMs: Decoupling and Exploring Mathematical Reasoning Abilities

...

168

30 Sep 2025

Hilbert: Recursively Building Formal Proofs with Informal Reasoning

228

26 Sep 2025

FormalML: A Benchmark for Evaluating Formal Subgoal Completion in Machine Learning Theory

126

26 Sep 2025

ASSESS: A Semantic and Structural Evaluation Framework for Statement Similarity

104

26 Sep 2025

EngiBench: A Benchmark for Evaluating Large Language Models on Engineering Problem Solving

...

22 Sep 2025

Large Language Models as End-to-end Combinatorial Optimization Solvers

199

21 Sep 2025

EconProver: Towards More Economical Test-Time Scaling for Automated Theorem Proving

136

16 Sep 2025

REAMS: Reasoning Enhanced Algorithm for Maths Solving

217

16 Sep 2025

Natural Language Translation of Formal Proofs through Informalization of Proof Steps and Recursive Summarization along Proof Structure

Seiji Hattori

Takuya Matsuzaki

Makoto Fujiwara

10 Sep 2025

A Fragile Number Sense: Probing the Elemental Limits of Numerical Reasoning in LLMs

Roussel Rahman

Aashwin Ananda Mishra

LRM

08 Sep 2025

A Case Study on the Effectiveness of LLMs in Verification with Proof Assistants

Barış Bayazıt

Yao Li

Xujie Si

26 Aug 2025

FormaRL: Enhancing Autoformalization with no Labeled Data

250

26 Aug 2025

Lean Meets Theoretical Computer Science: Scalable Synthesis of Theorem Proving Challenges in Formal-Informal Pairs

205

21 Aug 2025

Too Easily Fooled? Prompt Injection Breaks LLMs on Frustratingly Simple Multiple-Choice Questions

140

16 Aug 2025

An Investigation of Robustness of LLMs in Mathematical Reasoning: Benchmarking with Mathematically-Equivalent Transformation of Advanced Mathematical Problems

222

12 Aug 2025

Automated Formalization via Conceptual Retrieval-Augmented LLMs

156

09 Aug 2025

StepFun-Formalizer: Unlocking the Autoformalization Potential of LLMs through Knowledge-Reasoning Fusion

...

208

06 Aug 2025

Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction

...

174

05 Aug 2025

Proof2Hybrid: Automatic Mathematical Benchmark Synthesis for Proof-Centric Problems

224

04 Aug 2025

Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving

...

230

31 Jul 2025

Solving Formal Math Problems by Decomposition and Iterative Reflection

...

144

21 Jul 2025

LeanTree: Accelerating White-Box Proof Search with Factorized States in Lean 4

201

19 Jul 2025

ProofCompass: Enhancing Specialized Provers with LLM Guidance

Nicolas Wischermann

Claudio Mayrink Verdun

Gabriel Poesia

Francesco Noseda

LRM

169

18 Jul 2025

Generalized Tree Edit Distance (GTED): A Faithful Evaluation Metric for Statement Autoformalization

163

10 Jul 2025

Prover Agent: An Agent-Based Framework for Formal Mathematical Proofs

349

24 Jun 2025

Towards Advanced Mathematical Reasoning for LLMs via First-Order Logic Theorem Proving

150

20 Jun 2025

Beyond Gold Standards: Epistemic Ensemble of LLM Judges for Formal Mathematical Reasoning

Lan Zhang

Marco Valentino

André Freitas

287

12 Jun 2025

A Survey on Large Language Models for Mathematical Reasoning

...

272

10 Jun 2025

Mathesis: Towards Formal Theorem Proving from Natural Languages

...

225

08 Jun 2025

MATP-BENCH: Can MLLM Be a Good Automated Theorem Prover for Multimodal Problems?

232

06 Jun 2025