v1v2 (latest)

MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics

International Conference on Learning Representations (ICLR), 2021

31 August 2021

Papers citing "MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics"

50 / 170 papers shown

Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal VerificationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

370

05 Jun 2025

STORM-BORN: A Challenging Mathematical Derivations Dataset Curated via a Human-in-the-Loop Multi-Agent FrameworkAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

...

295

02 Jun 2025

ORMind: A Cognitive-Inspired End-to-End Reasoning Framework for Operations Research

260

02 Jun 2025

SiLVR: A Simple Language-based Video Reasoning Framework

185

30 May 2025

Autoformalization in the Era of Large Language Models: A Survey

340

29 May 2025

DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning

...

307

29 May 2025

Let's Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM's Math Capability

354

29 May 2025

ASyMOB: Algebraic Symbolic Mathematical Operations Benchmark

M. Shalyt

Rotem Elimelech

I. Kaminer

153

28 May 2025

Decomposing Elements of Problem Solving: What "Math" Does RL Teach?

207

28 May 2025

SMART: Self-Generating and Self-Validating Multi-Dimensional Assessment for LLMs' Mathematical Problem Solving

472

22 May 2025

HybridProver: Augmenting Theorem Proving with LLM-Driven Proof Synthesis and Refinement

160

21 May 2025

CLEVER: A Curated Benchmark for Formally Verified Code Generation

500

20 May 2025

LEXam: Benchmarking Legal Reasoning on 340 Law Exams

...

545

19 May 2025

Ineq-Comp: Benchmarking Human-Intuitive Compositional Reasoning in Automated Theorem Proving on Inequalities

363

19 May 2025

LLM-based Automated Theorem Proving Hinges on Scalable Synthetic Data Generation

300

17 May 2025

MPS-Prover: Advancing Stepwise Theorem Proving by Multi-Perspective Search and Data Curation

320

16 May 2025

Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving

319

07 May 2025

CombiBench: Benchmarking LLM Capability for Combinatorial Mathematics

...

580

06 May 2025

FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models

...

748

05 May 2025

Hierarchical Attention Generates Better ProofsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

231

27 Apr 2025

APE-Bench I: Towards File-level Automated Proof Engineering of Formal Math Libraries

290

27 Apr 2025

Neural Theorem Proving: Generating and Structuring Proofs for Formal Verification

Balaji Rao

William Eiers

Carlo Lipizzi

414

23 Apr 2025

Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning

...

334

15 Apr 2025

Reasoning Models Can Be Effective Without Thinking

423

109

14 Apr 2025

Leanabell-Prover: Posttraining Scaling in Formal Reasoning

483

08 Apr 2025

Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics

428

01 Apr 2025

Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad

505

27 Mar 2025

Rosetta-PL: Propositional Logic as a Benchmark for Large Language Model ReasoningNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025

Shaun Baek

Shaun Esua-Mensah

Cyrus Tsui

Sejan Vigneswaralingam

671

25 Mar 2025

A Survey on Mathematical Reasoning and Optimization with Large Language Models

Ali Forootani

OffRL LRM AI4CE

308

22 Mar 2025

Local Look-Ahead Guidance via Verifier-in-the-Loop for Automated Theorem ProvingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

366

12 Mar 2025

FANS -- Formal Answer Selection for Natural Language Math Reasoning Using Lean4

341

05 Mar 2025

From Hypothesis to Publication: A Comprehensive Survey of AI-Driven Research Support SystemsConference on Empirical Methods in Natural Language Processing (EMNLP), 2025

...

503

03 Mar 2025

CuDIP: Enhancing Theorem Proving in LLMs via Curriculum Learning-based Direct Preference Optimization

344

25 Feb 2025

Steering LLMs for Formal Theorem Proving

Shashank Kirtania

Arun Shankar Iyer

LLMSV

1.1K

21 Feb 2025

A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics

741

21 Feb 2025

Simplifying Formal Proof-Generating Models with ChatGPT and Basic Searching Techniques

1.1K

20 Feb 2025

Theoretical Physics Benchmark (TPBench) -- a Dataset and Study of AI Reasoning Capabilities in Theoretical Physics

Sai Chaitanya Tadepalli

AIMat

240

19 Feb 2025

Lean-ing on Quality: How High-Quality Data Beats Diverse Multilingual Data in AutoFormalization

Kai Fronsdal Sanmi Koyejo

176

18 Feb 2025

Autoformalization in the Wild: Assessing LLMs on Real-World Mathematical Definitions

Lan Zhang

Marco Valentino

André Freitas

363

17 Feb 2025

Generating Millions Of Lean Theorems With Proofs By Exploring State Transition Graphs

David Yin

Jing Gao

218

16 Feb 2025

One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs

...

451

12 Feb 2025

A cross-regional review of AI safety regulations in the commercial aviation

Penny A. Barr

Sohel M. Imroz

317

12 Feb 2025

Examining False Positives under Inference Scaling for Mathematical Reasoning

399

10 Feb 2025

ATLAS: Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data

357

08 Feb 2025

Automating Mathematical Proof Generation Using Large Language Model Agents and Knowledge Graphs

145

04 Feb 2025

Advanced Weakly-Supervised Formula Exploration for Neuro-Symbolic Mathematical Reasoning

Yuxuan Wu

Hideki Nakayama

NAI

244

02 Feb 2025

Understand, Solve and Translate: Bridging the Multilingual Mathematical Reasoning Gap

456

05 Jan 2025

Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

928

571

03 Jan 2025

Mathematical Language Models: A Survey

...

612

03 Jan 2025

Formal Mathematical Reasoning: A New Frontier in AI

402

20 Dec 2024