ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2109.00110
  4. Cited By
MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics
v1v2 (latest)

MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics

International Conference on Learning Representations (ICLR), 2021
31 August 2021
Kunhao Zheng
Jesse Michael Han
Stanislas Polu
    AIMat
ArXiv (abs)PDFHTML

Papers citing "MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics"

50 / 170 papers shown
Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal VerificationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Chengwu Liu
Ye Yuan
Yichun Yin
Yan Xu
Xin Xu
Zaoyu Chen
Yasheng Wang
Lifeng Shang
Qun Liu
Ming Zhang
LRM
370
7
0
05 Jun 2025
STORM-BORN: A Challenging Mathematical Derivations Dataset Curated via a Human-in-the-Loop Multi-Agent Framework
STORM-BORN: A Challenging Mathematical Derivations Dataset Curated via a Human-in-the-Loop Multi-Agent FrameworkAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Wenhao Liu
Zhenyi Lu
Xinyu Hu
Jierui Zhang
Dailin Li
...
Pei Zhang
Chengbo Zhang
Yuxiang Ren
Xiaohong Huang
Yan Ma
OffRL
295
3
0
02 Jun 2025
ORMind: A Cognitive-Inspired End-to-End Reasoning Framework for Operations Research
ORMind: A Cognitive-Inspired End-to-End Reasoning Framework for Operations Research
Zhiyuan Wang
Bokui Chen
Yinya Huang
Qingxing Cao
Ming He
Jianping Fan
Xiaodan Liang
LRM
260
4
0
02 Jun 2025
SiLVR: A Simple Language-based Video Reasoning Framework
SiLVR: A Simple Language-based Video Reasoning Framework
Ce Zhang
Yan-Bo Lin
Ziyang Wang
Mohit Bansal
Gedas Bertasius
LRM
185
7
0
30 May 2025
Autoformalization in the Era of Large Language Models: A Survey
Autoformalization in the Era of Large Language Models: A Survey
Ke Weng
Lun Du
Sirui Li
Wangyue Lu
Haozhe Sun
Hengyu Liu
Tiancheng Zhang
AI4CELRM
340
9
0
29 May 2025
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning
Ziyin Zhang
Jiahao Xu
Zhiwei He
Tian Liang
Qiuzhi Liu
...
Zhuosheng Zhang
Rui Wang
Zhaopeng Tu
Haitao Mi
Dong Yu
OffRLLRM
307
10
0
29 May 2025
Let's Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM's Math Capability
Let's Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM's Math Capability
Ruida Wang
Yuxin Li
Yi R.
Fung
LRM
354
7
0
29 May 2025
ASyMOB: Algebraic Symbolic Mathematical Operations Benchmark
ASyMOB: Algebraic Symbolic Mathematical Operations Benchmark
M. Shalyt
Rotem Elimelech
I. Kaminer
153
3
0
28 May 2025
Decomposing Elements of Problem Solving: What "Math" Does RL Teach?
Decomposing Elements of Problem Solving: What "Math" Does RL Teach?
Tian Qin
Core Francisco Park
Mujin Kwun
Aaron Walsman
Eran Malach
Nikhil Anand
Hidenori Tanaka
David Alvarez-Melis
ReLMOffRLLRM
207
4
0
28 May 2025
SMART: Self-Generating and Self-Validating Multi-Dimensional Assessment for LLMs' Mathematical Problem Solving
SMART: Self-Generating and Self-Validating Multi-Dimensional Assessment for LLMs' Mathematical Problem Solving
Yujie Hou
Ting Zhang
Mei Wang
Xuetao Ma
Hua Huang
LRM
472
0
0
22 May 2025
HybridProver: Augmenting Theorem Proving with LLM-Driven Proof Synthesis and Refinement
HybridProver: Augmenting Theorem Proving with LLM-Driven Proof Synthesis and Refinement
Jilin Hu
Jianyu Zhang
Yongwang Zhao
Talia Ringer
160
2
0
21 May 2025
CLEVER: A Curated Benchmark for Formally Verified Code Generation
CLEVER: A Curated Benchmark for Formally Verified Code Generation
Amitayush Thakur
Jasper Lee
George Tsoukalas
Meghana Sistla
Matthew Zhao
Stefan Zetzsche
Greg Durrett
Yisong Yue
Swarat Chaudhuri
ALM
500
9
0
20 May 2025
LEXam: Benchmarking Legal Reasoning on 340 Law Exams
LEXam: Benchmarking Legal Reasoning on 340 Law Exams
Yu Fan
Jingwei Ni
Jakob Merane
Etienne Salimbeni
Yoan Hermstrüwer
...
Mrinmaya Sachan
Alexander Stremitzer
Christoph Engel
Elliott Ash
Joel Niklaus
AILawELM
545
11
0
19 May 2025
Ineq-Comp: Benchmarking Human-Intuitive Compositional Reasoning in Automated Theorem Proving on Inequalities
Ineq-Comp: Benchmarking Human-Intuitive Compositional Reasoning in Automated Theorem Proving on Inequalities
Haoyu Zhao
Yihan Geng
Shange Tang
Yong Lin
Bohan Lyu
Hongzhou Lin
Chi Jin
Sanjeev Arora
363
5
0
19 May 2025
LLM-based Automated Theorem Proving Hinges on Scalable Synthetic Data Generation
LLM-based Automated Theorem Proving Hinges on Scalable Synthetic Data Generation
Junyu Lai
Jiakun Zhang
Shuo Xu
Taolue Chen
Zihang Wang
Yao Yang
Jiarui Zhang
Chun Cao
Jingwei Xu
300
1
0
17 May 2025
MPS-Prover: Advancing Stepwise Theorem Proving by Multi-Perspective Search and Data Curation
MPS-Prover: Advancing Stepwise Theorem Proving by Multi-Perspective Search and Data Curation
Zhenwen Liang
Linfeng Song
Yang Li
Tao Yang
Feng Zhang
Haitao Mi
Dong Yu
LRM
320
8
0
16 May 2025
Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving
Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving
Zijun Chen
Xinhao Zheng
Renqiu Xia
Xingzhi Qi
Qinxiang Cao
Junchi Yan
AIMat
319
1
0
07 May 2025
CombiBench: Benchmarking LLM Capability for Combinatorial Mathematics
CombiBench: Benchmarking LLM Capability for Combinatorial Mathematics
Qingbin Liu
Xiaohan Lin
Jonas Bayer
Yael Dillies
Weijie Jiang
...
Zhengfeng Yang
Jiawei Zhang
Lihong Zhi
Jia-Nan Li
Zhengying Liu
580
14
0
06 May 2025
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models
Zhouliang Yu
Ruotian Peng
Keyi Ding
Yiming Li
Zhongyuan Peng
...
Huajian Xin
Wenjie Huang
Yandong Wen
Ge Zhang
Weiyang Liu
LRM
748
17
0
05 May 2025
Hierarchical Attention Generates Better Proofs
Hierarchical Attention Generates Better ProofsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Jianlong Chen
Chao Li
Yang Yuan
Andrew Chi-Chih Yao
AIMatLRM
231
0
0
27 Apr 2025
APE-Bench I: Towards File-level Automated Proof Engineering of Formal Math Libraries
APE-Bench I: Towards File-level Automated Proof Engineering of Formal Math Libraries
Huajian Xin
Luming Li
Xiaoran Jin
Jacques Fleuriot
Wenda Li
AIMat
290
2
0
27 Apr 2025
Neural Theorem Proving: Generating and Structuring Proofs for Formal Verification
Neural Theorem Proving: Generating and Structuring Proofs for Formal Verification
Balaji Rao
William Eiers
Carlo Lipizzi
414
2
0
23 Apr 2025
Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning
Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning
Haiming Wang
Mert Unsal
Xiaohan Lin
Mantas Baksys
Qingbin Liu
...
Zhouliang Yu
Liang Luo
Zhilin Yang
Zhengying Liu
Jia-Nan Li
AIMatReLMAI4TSLRM
334
86
0
15 Apr 2025
Reasoning Models Can Be Effective Without Thinking
Reasoning Models Can Be Effective Without Thinking
Wenjie Ma
Jingxuan He
Charlie Snell
Tyler Griggs
Sewon Min
Matei A. Zaharia
ReLMLRM
423
109
1
14 Apr 2025
Leanabell-Prover: Posttraining Scaling in Formal Reasoning
Leanabell-Prover: Posttraining Scaling in Formal Reasoning
Jingyuan Zhang
Qi Wang
Xingguang Ji
Wenshu Fan
Yang Yue
Fuzheng Zhang
Di Zhang
Guorui Zhou
Kun Gai
LRM
483
18
0
08 Apr 2025
Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics
Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics
Hamed Mahdavi
Alireza Hashemi
Majid Daliri
Pegah Mohammadipour
Alireza Farhadi
Samira Malek
Yekta Yazdanifard
Amir Khasahmadi
V. Honavar
ELMLRM
428
16
0
01 Apr 2025
Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad
Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad
Ivo Petrov
Jasper Dekoninck
Lyuben Baltadzhiev
Maria Drencheva
Kristian Minchev
Mislav Balunović
Nikola Jovanović
Martin Vechev
LRMELM
505
59
0
27 Mar 2025
Rosetta-PL: Propositional Logic as a Benchmark for Large Language Model Reasoning
Rosetta-PL: Propositional Logic as a Benchmark for Large Language Model ReasoningNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Shaun Baek
Shaun Esua-Mensah
Cyrus Tsui
Sejan Vigneswaralingam
Abdullah Alali
Michael Lu
Sean O Brien
Sean O'Brien
Kevin Zhu
LRM
671
1
0
25 Mar 2025
A Survey on Mathematical Reasoning and Optimization with Large Language Models
A Survey on Mathematical Reasoning and Optimization with Large Language Models
Ali Forootani
OffRLLRMAI4CE
308
11
0
22 Mar 2025
Local Look-Ahead Guidance via Verifier-in-the-Loop for Automated Theorem Proving
Local Look-Ahead Guidance via Verifier-in-the-Loop for Automated Theorem ProvingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Sara Rajaee
Kumar Pratik
Gabriele Cesa
Arash Behboodi
OffRLLRM
366
2
0
12 Mar 2025
FANS -- Formal Answer Selection for Natural Language Math Reasoning Using Lean4
FANS -- Formal Answer Selection for Natural Language Math Reasoning Using Lean4
Jiarui Yao
Ruida Wang
Tong Zhang
LRM
341
2
0
05 Mar 2025
From Hypothesis to Publication: A Comprehensive Survey of AI-Driven Research Support Systems
From Hypothesis to Publication: A Comprehensive Survey of AI-Driven Research Support SystemsConference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Zekun Zhou
Xiaocheng Feng
Daigang Xu
Xiachong Feng
Ziyun Song
...
Baoxin Wang
Dayong Wu
Guoping Hu
Ting Liu
Bing Qin
AI4TS
503
7
0
03 Mar 2025
CuDIP: Enhancing Theorem Proving in LLMs via Curriculum Learning-based Direct Preference Optimization
CuDIP: Enhancing Theorem Proving in LLMs via Curriculum Learning-based Direct Preference Optimization
Shuming Shi
Ruobing Zuo
Gaolei He
Jianlin Wang
Chenyang Xu
Zhengfeng Yang
344
0
0
25 Feb 2025
Steering LLMs for Formal Theorem Proving
Steering LLMs for Formal Theorem Proving
Shashank Kirtania
Arun Shankar Iyer
LLMSV
1.1K
0
0
21 Feb 2025
A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics
A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics
Ting-Ruen Wei
Haowei Liu
Xuyang Wu
Yi Fang
LRMAI4CEReLMKELM
741
8
0
21 Feb 2025
Simplifying Formal Proof-Generating Models with ChatGPT and Basic Searching Techniques
Simplifying Formal Proof-Generating Models with ChatGPT and Basic Searching Techniques
Sangjun Han
Taeil Hur
Youngmi Hur
Kathy Sangkyung Lee
Myungyoon Lee
Hyojae Lim
1.1K
0
0
20 Feb 2025
Theoretical Physics Benchmark (TPBench) -- a Dataset and Study of AI Reasoning Capabilities in Theoretical Physics
Theoretical Physics Benchmark (TPBench) -- a Dataset and Study of AI Reasoning Capabilities in Theoretical Physics
Daniel J.H. Chung
Zhiqi Gao
Yurii Kvasiuk
Tianyi Li
Moritz Münchmeyer
Maja Rudolph
Frederic Sala
Sai Chaitanya Tadepalli
AIMat
240
18
0
19 Feb 2025
Lean-ing on Quality: How High-Quality Data Beats Diverse Multilingual Data in AutoFormalization
Lean-ing on Quality: How High-Quality Data Beats Diverse Multilingual Data in AutoFormalization
Willy Chan
Michael Souliman
Jakob Nordhagen
Alycia Lee
Elyas Obbad
Kai Fronsdal Sanmi Koyejo
176
3
0
18 Feb 2025
Autoformalization in the Wild: Assessing LLMs on Real-World Mathematical Definitions
Autoformalization in the Wild: Assessing LLMs on Real-World Mathematical Definitions
Lan Zhang
Marco Valentino
André Freitas
363
2
0
17 Feb 2025
Generating Millions Of Lean Theorems With Proofs By Exploring State Transition Graphs
David Yin
Jing Gao
218
1
0
16 Feb 2025
One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs
One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs
Hai-Tao Zheng
Jiayi Kuang
Haojing Huang
Zhikun Xu
Xinnian Liang
...
Jue Chen
Chao Qu
Ying Shen
Hai-Tao Zheng
Philip S. Yu
LRM
451
12
0
12 Feb 2025
A cross-regional review of AI safety regulations in the commercial aviation
Penny A. Barr
Sohel M. Imroz
317
0
0
12 Feb 2025
Examining False Positives under Inference Scaling for Mathematical Reasoning
Examining False Positives under Inference Scaling for Mathematical Reasoning
Yu Guang Wang
Nan Yang
Liang Wang
Furu Wei
Fuli Feng
LRM
399
8
0
10 Feb 2025
ATLAS: Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data
ATLAS: Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data
Xiaoyang Liu
Kangjie Bao
Jiashuo Zhang
Yunqi Liu
Yu Chen
Yu Chen
Yang Jiao
Tao Luo
AIMat
357
12
0
08 Feb 2025
Automating Mathematical Proof Generation Using Large Language Model Agents and Knowledge Graphs
Automating Mathematical Proof Generation Using Large Language Model Agents and Knowledge Graphs
Vincent Li
Tim Knappe
Yule Fu
Kevin Han
Kevin Zhu
LRM
145
0
0
04 Feb 2025
Advanced Weakly-Supervised Formula Exploration for Neuro-Symbolic Mathematical Reasoning
Advanced Weakly-Supervised Formula Exploration for Neuro-Symbolic Mathematical Reasoning
Yuxuan Wu
Hideki Nakayama
NAI
244
1
0
02 Feb 2025
Understand, Solve and Translate: Bridging the Multilingual Mathematical Reasoning Gap
Understand, Solve and Translate: Bridging the Multilingual Mathematical Reasoning Gap
Hyunwoo Ko
Guijin Son
Dasol Choi
RALMLRM
456
26
0
05 Jan 2025
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Bradley Brown
Jordan Juravsky
Ryan Ehrlich
Ronald Clark
Quoc V. Le
Christopher Ré
Azalia Mirhoseini
ALMLRM
928
571
0
03 Jan 2025
Mathematical Language Models: A Survey
Mathematical Language Models: A Survey
Wen Liu
Hanglei Hu
Jie Zhou
Yuyang Ding
Junsong Li
...
Mengliang He
Qin Chen
Bo Jiang
Aimin Zhou
Liang He
LRM
612
21
0
03 Jan 2025
Formal Mathematical Reasoning: A New Frontier in AI
Formal Mathematical Reasoning: A New Frontier in AI
Kaiyu Yang
Gabriel Poesia
Jingxuan He
Wenda Li
Kristin Lauter
Swarat Chaudhuri
Dawn Song
LRMAI4CE
402
66
0
20 Dec 2024
Previous
1234
Next