ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.13867
  4. Cited By
Mathematical Capabilities of ChatGPT
v1v2 (latest)

Mathematical Capabilities of ChatGPT

Neural Information Processing Systems (NeurIPS), 2023
31 January 2023
Simon Frieder
Luca Pinchetti
Alexis Chevalier
Ryan-Rhys Griffiths
Tommaso Salvatori
Thomas Lukasiewicz
P. Petersen
Julius Berner
    ELMAI4MH
ArXiv (abs)PDFHTML

Papers citing "Mathematical Capabilities of ChatGPT"

50 / 227 papers shown
Title
Sequential Enumeration in Large Language Models
Sequential Enumeration in Large Language Models
Kuinan Hou
Marco Zorzi
Alberto Testolin
68
1
0
04 Dec 2025
CoT4AD: A Vision-Language-Action Model with Explicit Chain-of-Thought Reasoning for Autonomous Driving
CoT4AD: A Vision-Language-Action Model with Explicit Chain-of-Thought Reasoning for Autonomous Driving
Zhaohui Wang
Tengbo Yu
Hao Tang
LRM
136
0
0
27 Nov 2025
Enhancing Large Language Models for Automated Homework Assessment in Undergraduate Circuit Analysis
Enhancing Large Language Models for Automated Homework Assessment in Undergraduate Circuit Analysis
Liangliang Chen
Huiru Xie
Zhihao Qin
Yiming Guo
Jacqueline Rohde
Ying Zhang
81
0
0
22 Nov 2025
Trustworthy LLM-Mediated Communication: Evaluating Information Fidelity in LLM as a Communicator (LAAC) Framework in Multiple Application Domains
Trustworthy LLM-Mediated Communication: Evaluating Information Fidelity in LLM as a Communicator (LAAC) Framework in Multiple Application Domains
Mohammed Musthafa Rafi
Adarsh Krishnamurthy
Aditya Balu
124
0
0
06 Nov 2025
The ORCA Benchmark: Evaluating Real-World Calculation Accuracy in Large Language Models
The ORCA Benchmark: Evaluating Real-World Calculation Accuracy in Large Language Models
Claudia Herambourg
Dawid Siuda
Julia Kopczyńska
Joao R. L. Santos
Wojciech Sas
Joanna Śmietańska-Nowak
ELMALMLRM
382
0
0
04 Nov 2025
Test-Time Tuned Language Models Enable End-to-end De Novo Molecular Structure Generation from MS/MS Spectra
Test-Time Tuned Language Models Enable End-to-end De Novo Molecular Structure Generation from MS/MS Spectra
Laura Mismetti
Marvin Alberts
Andreas Krause
Mara Graziani
88
0
0
27 Oct 2025
Foundation of Intelligence: Review of Math Word Problems from Human Cognition Perspective
Foundation of Intelligence: Review of Math Word Problems from Human Cognition Perspective
Zhenya Huang
Jiayu Liu
Xin Lin
Zhiyuan Ma
Shangzi Xue
Tong Xiao
Qi Liu
Yee Whye Teh
Enhong Chen
AIMatLRM
330
0
0
24 Oct 2025
Max It or Miss It: Benchmarking LLM On Solving Extremal Problems
Max It or Miss It: Benchmarking LLM On Solving Extremal Problems
Binxin Gao
Jingjun Han
ELMLRM
197
0
0
14 Oct 2025
RefGrader: Automated Grading of Mathematical Competition Proofs using Agentic Workflows
RefGrader: Automated Grading of Mathematical Competition Proofs using Agentic Workflows
Hamed Mahdavi
Pouria Mahdavinia
Samira Malek
Pegah Mohammadipour
Alireza Hashemi
Majid Daliri
Alireza Farhadi
Amir Khasahmadi
Niloofar Mireshghallah
V. Honavar
146
1
0
10 Oct 2025
BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
Ivo Petrov
Jasper Dekoninck
Martin Vechev
142
2
0
06 Oct 2025
IMProofBench: Benchmarking AI on Research-Level Mathematical Proof Generation
IMProofBench: Benchmarking AI on Research-Level Mathematical Proof Generation
Johannes Schmitt
Gergely Bérczi
Jasper Dekoninck
Jeremy Feusi
Tim Gehrunger
...
Raúl Sánchez Galán
Zheming Sun
Josef Teichmann
Richard P. Thomas
Charles Vial
LRM
122
3
0
30 Sep 2025
WirelessMathLM: Teaching Mathematical Reasoning for LLMs in Wireless Communications with Reinforcement Learning
WirelessMathLM: Teaching Mathematical Reasoning for LLMs in Wireless Communications with Reinforcement Learning
Xin Li
Mengbing Liu
Yiyang Zhu
W. Zhang
Li Wei
Jiancheng An
Chau Yuen
LRM
66
0
0
27 Sep 2025
Evaluating undergraduate mathematics examinations in the era of generative AI: a curriculum-level case study
Evaluating undergraduate mathematics examinations in the era of generative AI: a curriculum-level case study
Benjamin J. Walker
Nikoleta Kalaydzhieva
Beatriz Navarro Lameda
Ruth A. Reynolds
ELM
140
0
0
15 Sep 2025
Efficient Training-Free Online Routing for High-Volume Multi-LLM Serving
Efficient Training-Free Online Routing for High-Volume Multi-LLM Serving
Fangzhou Wu
Sandeep Silwal
221
0
0
02 Sep 2025
A perishable ability? The future of writing in the face of generative artificial intelligence
A perishable ability? The future of writing in the face of generative artificial intelligence
Evandro L. T. P. Cunha
DeLMO
165
0
0
26 Aug 2025
Rethinking Reasoning in LLMs: Neuro-Symbolic Local RetoMaton Beyond ICL and CoT
Rethinking Reasoning in LLMs: Neuro-Symbolic Local RetoMaton Beyond ICL and CoT
Rushitha Santhoshi Mamidala
Anshuman Chhabra
Ankur Mali
OffRLLRM
118
0
0
22 Aug 2025
Structuring the Unstructured: A Systematic Review of Text-to-Structure Generation for Agentic AI with a Universal Evaluation Framework
Structuring the Unstructured: A Systematic Review of Text-to-Structure Generation for Agentic AI with a Universal Evaluation Framework
Zheye Deng
Chunkit Chan
Tianshi Zheng
Wei Fan
Weiqi Wang
Yangqiu Song
116
3
0
17 Aug 2025
Too Easily Fooled? Prompt Injection Breaks LLMs on Frustratingly Simple Multiple-Choice Questions
Too Easily Fooled? Prompt Injection Breaks LLMs on Frustratingly Simple Multiple-Choice Questions
Xuyang Guo
Zekai Huang
Zhao Song
Jiahao Zhang
LRM
140
3
0
16 Aug 2025
A Multi-Task Evaluation of LLMs' Processing of Academic Text Input
A Multi-Task Evaluation of LLMs' Processing of Academic Text Input
Tianyi Li
Yu Qin
Olivia R. Liu Sheng
89
0
0
15 Aug 2025
NLP Methods May Actually Be Better Than Professors at Estimating Question Difficulty
NLP Methods May Actually Be Better Than Professors at Estimating Question Difficulty
Leonidas Zotos
Ivo Pascal de Jong
Matias Valdenegro-Toro
A. Sburlea
Malvina Nissim
H. Rijn
148
0
0
05 Aug 2025
Causes in neuron diagrams, and testing causal reasoning in Large Language Models. A glimpse of the future of philosophy?
Causes in neuron diagrams, and testing causal reasoning in Large Language Models. A glimpse of the future of philosophy?
Louis Vervoort
Vitaly Nikolaev
158
0
0
17 Jun 2025
AbsenceBench: Language Models Can't Tell What's Missing
AbsenceBench: Language Models Can't Tell What's Missing
Harvey Yiyun Fu
Aryan Shrivastava
Jared Moore
Peter West
Chenhao Tan
Ari Holtzman
RALM
198
3
0
13 Jun 2025
MATP-BENCH: Can MLLM Be a Good Automated Theorem Prover for Multimodal Problems?
MATP-BENCH: Can MLLM Be a Good Automated Theorem Prover for Multimodal Problems?
Zhitao He
Zongwei Lyu
Dazhong Chen
Dadi Guo
Yi R. Fung
LRM
220
5
0
06 Jun 2025
XToM: Exploring the Multilingual Theory of Mind for Large Language Models
XToM: Exploring the Multilingual Theory of Mind for Large Language Models
Chunkit Chan
Yauwai Yim
Hongchuan Zeng
Zhiying Zou
Xinyuan Cheng
...
Ginny Wong
Helmut Schmid
Hinrich Schütze
Simon See
Yangqiu Song
LRM
201
0
0
03 Jun 2025
Evaluation of LLMs for mathematical problem solving
Evaluation of LLMs for mathematical problem solving
Ruonan Wang
Runxi Wang
Yunwen Shen
Chengfeng Wu
Qinglin Zhou
Rohitash Chandra
ELMLRM
382
2
0
30 May 2025
ASyMOB: Algebraic Symbolic Mathematical Operations Benchmark
ASyMOB: Algebraic Symbolic Mathematical Operations Benchmark
M. Shalyt
Rotem Elimelech
I. Kaminer
141
3
0
28 May 2025
Born a Transformer -- Always a Transformer? On the Effect of Pretraining on Architectural Abilities
Born a Transformer -- Always a Transformer? On the Effect of Pretraining on Architectural Abilities
Yana Veitsman
Mayank Jobanputra
Yash Sarrof
Aleksandra Bakalova
Vera Demberg
Ellie Pavlick
Michael Hahn
466
2
0
27 May 2025
Two Causally Related Needles in a Video Haystack
Two Causally Related Needles in a Video Haystack
Miaoyu Li
Qin Chao
Boyang Albert Li
CML
296
0
0
26 May 2025
Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks
Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks
Debargha Ganguly
Vikash Singh
Sreehari Sankar
Biyao Zhang
Xuecen Zhang
Srinivasan Iyengar
Xiaotian Han
Amit Sharma
Shivkumar Kalyanaraman
Vipin Chaudhary
291
2
0
26 May 2025
Small Models, Smarter Learning: The Power of Joint Task Training
Small Models, Smarter Learning: The Power of Joint Task Training
C. Both
Benjamin Hoover
Hendrik Strobelt
Dmitry Krotov
Daniel Karl I. Weidele
Mauro Martino
Nima Dehmamy
206
0
0
23 May 2025
AutoMathKG: The automated mathematical knowledge graph based on LLM and vector database
AutoMathKG: The automated mathematical knowledge graph based on LLM and vector databaseInternational Conference on Climate Informatics (ICCI), 2025
Rong Bian
Yu Geng
Zijian Yang
Bing Cheng
516
2
0
19 May 2025
From Recall to Reasoning: Automated Question Generation for Deeper Math Learning through Large Language Models
From Recall to Reasoning: Automated Question Generation for Deeper Math Learning through Large Language ModelsInternational Conference on Artificial Intelligence in Education (AIED), 2025
Yongan Yu
Alexandre Krantz
Nikki G. Lobczowski
LRM
149
1
0
17 May 2025
Combining Bayesian Inference and Reinforcement Learning for Agent Decision Making: A Review
Combining Bayesian Inference and Reinforcement Learning for Agent Decision Making: A Review
Chengmin Zhou
Ville Kyrki
Pasi Fränti
Laura Ruotsalainen
BDLAI4CE
404
0
0
12 May 2025
Assessment of Evolving Large Language Models in Upper Secondary Mathematics
Assessment of Evolving Large Language Models in Upper Secondary Mathematics
Mika Setälä
Pieta Sikström
Ville Heilala
T. Karkkainen
ELMLRM
250
1
0
15 Apr 2025
Physics-informed KAN PointNet: Deep learning for simultaneous solutions to inverse problems in incompressible flow on numerous irregular geometries
Physics-informed KAN PointNet: Deep learning for simultaneous solutions to inverse problems in incompressible flow on numerous irregular geometries
Ali Kashefi
T. Mukerji
3DPCPINN
333
5
0
08 Apr 2025
Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics
Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics
Hamed Mahdavi
Alireza Hashemi
Majid Daliri
Pegah Mohammadipour
Alireza Farhadi
Samira Malek
Yekta Yazdanifard
Amir Khasahmadi
V. Honavar
ELMLRM
388
16
0
01 Apr 2025
Integrating Artificial Intelligence with Human Expertise: An In-depth Analysis of ChatGPT's Capabilities in Generating Metamorphic Relations
Integrating Artificial Intelligence with Human Expertise: An In-depth Analysis of ChatGPT's Capabilities in Generating Metamorphic Relations
Yifan Zhang
Dave Towey
Matthew Pike
Q. Luu
Huai Liu
T. Chen
194
0
0
28 Mar 2025
Boosting Large Language Models with Mask Fine-Tuning
Boosting Large Language Models with Mask Fine-Tuning
M. Zhang
Yue Bai
Huan Wang
Yizhou Wang
Qihua Dong
Y. Fu
CLL
207
2
0
27 Mar 2025
Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad
Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad
Ivo Petrov
Jasper Dekoninck
Lyuben Baltadzhiev
Maria Drencheva
Kristian Minchev
Mislav Balunović
Nikola Jovanović
Martin Vechev
LRMELM
492
57
0
27 Mar 2025
ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation
ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation
Haoyu Fu
Diankun Zhang
Zongchuang Zhao
Jianfeng Cui
Dingkang Liang
Chong Zhang
Dingyuan Zhang
Hongwei Xie
Bing Wang
Xiang Bai
341
54
0
25 Mar 2025
MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems
MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems
Felix Chen
Hangjie Yuan
Yunqiu Xu
Tao Feng
Jun Cen
Pengwei Liu
Zeying Huang
Yi Yang
LRM
248
5
0
19 Mar 2025
Unlocking Learning Potentials: The Transformative Effect of Generative AI in Education Across Grade Levels
Unlocking Learning Potentials: The Transformative Effect of Generative AI in Education Across Grade Levels
Meijuan Xie
Liling Luo
93
0
0
15 Mar 2025
Out-of-Context Reasoning in Large Language Models
Out-of-Context Reasoning in Large Language Models
Jonathan Shaki
Emanuele La Malfa
Michael Wooldridge
Sarit Kraus
LRMReLM
398
0
0
13 Mar 2025
Numerical Error Analysis of Large Language Models
Stanislav Budzinskiy
Wenyi Fang
Longbin Zeng
Philipp Petersen
200
2
0
13 Mar 2025
SciHorizon: Benchmarking AI-for-Science Readiness from Scientific Data to Large Language Models
SciHorizon: Benchmarking AI-for-Science Readiness from Scientific Data to Large Language Models
Chuan Qin
Xiusi Chen
Chengrui Wang
Pengmin Wu
Xi Chen
...
Han Wu
Chong Li
Yuanchun Zhou
H. Xiong
Hengshu Zhu
ELM
291
6
0
12 Mar 2025
Measure Twice, Cut Once: Grasping Video Structures and Event Semantics with LLMs for Video Temporal Localization
Zongshang Pang
Mayu Otani
Yuta Nakashima
311
3
0
12 Mar 2025
AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning
Bo Jiang
Shaoyu Chen
Qian Zhang
Wenyu Liu
Xinggang Wang
OffRLLRMVLM
329
41
0
10 Mar 2025
PiCO: Peer Review in LLMs based on the Consistency Optimization
PiCO: Peer Review in LLMs based on the Consistency Optimization
Hai-Jian Ke
Shuo Yang
Yu-Yang Liu
Jia-Yu Yao
Zhen-Hui Liu
Yu Wang
Ming Pang
Li Yuan
ALM
475
14
0
24 Feb 2025
Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents
Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents
Patrick Tser Jern Kon
Jiachen Liu
Qiuyi Ding
Yiming Qiu
Zhenning Yang
Yibo Huang
Jayanth Srinivasa
Myungjin Lee
Mosharaf Chowdhury
Ang Chen
342
16
0
22 Feb 2025
Testing GPT-4 with Wolfram Alpha and Code Interpreter plug-ins on math and science problems
Testing GPT-4 with Wolfram Alpha and Code Interpreter plug-ins on math and science problems
E. Davis
S. Aaronson
ELM
354
26
0
21 Feb 2025
12345
Next