Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2301.13867
Cited By

Mathematical Capabilities of ChatGPT

v1v2 (latest)

Mathematical Capabilities of ChatGPT

Neural Information Processing Systems (NeurIPS), 2023

31 January 2023

Alexis Chevalier

Ryan-Rhys Griffiths

Tommaso Salvatori

Thomas Lukasiewicz

ArXiv (abs)PDF HTML

Papers citing "Mathematical Capabilities of ChatGPT"

50 / 227 papers shown

Sequential Enumeration in Large Language Models

Sequential Enumeration in Large Language Models

Alberto Testolin

129

1

0

04 Dec 2025

CoT4AD: A Vision-Language-Action Model with Explicit Chain-of-Thought Reasoning for Autonomous Driving

CoT4AD: A Vision-Language-Action Model with Explicit Chain-of-Thought Reasoning for Autonomous Driving

169

0

0

27 Nov 2025

Enhancing Large Language Models for Automated Homework Assessment in Undergraduate Circuit Analysis

Enhancing Large Language Models for Automated Homework Assessment in Undergraduate Circuit Analysis

Liangliang Chen

Jacqueline Rohde

97

2

0

22 Nov 2025

Trustworthy LLM-Mediated Communication: Evaluating Information Fidelity in LLM as a Communicator (LAAC) Framework in Multiple Application Domains

Trustworthy LLM-Mediated Communication: Evaluating Information Fidelity in LLM as a Communicator (LAAC) Framework in Multiple Application Domains

Mohammed Musthafa Rafi

Adarsh Krishnamurthy

125

0

0

06 Nov 2025

The ORCA Benchmark: Evaluating Real-World Calculation Accuracy in Large Language Models

The ORCA Benchmark: Evaluating Real-World Calculation Accuracy in Large Language Models

Claudia Herambourg

Julia Kopczyńska

Joao R. L. Santos

Joanna Śmietańska-Nowak

400

0

0

04 Nov 2025

Test-Time Tuned Language Models Enable End-to-end De Novo Molecular Structure Generation from MS/MS Spectra

Test-Time Tuned Language Models Enable End-to-end De Novo Molecular Structure Generation from MS/MS Spectra

101

0

0

27 Oct 2025

Foundation of Intelligence: Review of Math Word Problems from Human Cognition Perspective

Foundation of Intelligence: Review of Math Word Problems from Human Cognition Perspective

352

0

0

24 Oct 2025

Max It or Miss It: Benchmarking LLM On Solving Extremal Problems

Max It or Miss It: Benchmarking LLM On Solving Extremal Problems

213

0

0

14 Oct 2025

RefGrader: Automated Grading of Mathematical Competition Proofs using Agentic Workflows

RefGrader: Automated Grading of Mathematical Competition Proofs using Agentic Workflows

Pouria Mahdavinia

Pegah Mohammadipour

Alireza Hashemi

Alireza Farhadi

Amir Khasahmadi

Niloofar Mireshghallah

162

1

0

10 Oct 2025

BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs

BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs

Jasper Dekoninck

153

4

0

06 Oct 2025

IMProofBench: Benchmarking AI on Research-Level Mathematical Proof Generation

IMProofBench: Benchmarking AI on Research-Level Mathematical Proof Generation

Johannes Schmitt

Gergely Bérczi

Jasper Dekoninck

...

Raúl Sánchez Galán

Josef Teichmann

Richard P. Thomas

136

3

0

30 Sep 2025

WirelessMathLM: Teaching Mathematical Reasoning for LLMs in Wireless Communications with Reinforcement Learning

WirelessMathLM: Teaching Mathematical Reasoning for LLMs in Wireless Communications with Reinforcement Learning

72

0

0

27 Sep 2025

Evaluating undergraduate mathematics examinations in the era of generative AI: a curriculum-level case study

Evaluating undergraduate mathematics examinations in the era of generative AI: a curriculum-level case study

Benjamin J. Walker

Nikoleta Kalaydzhieva

Beatriz Navarro Lameda

Ruth A. Reynolds

198

0

0

15 Sep 2025

Efficient Training-Free Online Routing for High-Volume Multi-LLM Serving

Efficient Training-Free Online Routing for High-Volume Multi-LLM Serving

235

0

0

02 Sep 2025

A perishable ability? The future of writing in the face of generative artificial intelligence

A perishable ability? The future of writing in the face of generative artificial intelligence

Evandro L. T. P. Cunha

175

0

0

26 Aug 2025

Rethinking Reasoning in LLMs: Neuro-Symbolic Local RetoMaton Beyond ICL and CoT

Rethinking Reasoning in LLMs: Neuro-Symbolic Local RetoMaton Beyond ICL and CoT

Rushitha Santhoshi Mamidala

Anshuman Chhabra

128

0

0

22 Aug 2025

Structuring the Unstructured: A Systematic Review of Text-to-Structure Generation for Agentic AI with a Universal Evaluation Framework

Structuring the Unstructured: A Systematic Review of Text-to-Structure Generation for Agentic AI with a Universal Evaluation Framework

129

3

0

17 Aug 2025

Too Easily Fooled? Prompt Injection Breaks LLMs on Frustratingly Simple Multiple-Choice Questions

Too Easily Fooled? Prompt Injection Breaks LLMs on Frustratingly Simple Multiple-Choice Questions

144

3

0

16 Aug 2025

A Multi-Task Evaluation of LLMs' Processing of Academic Text Input

A Multi-Task Evaluation of LLMs' Processing of Academic Text Input

Olivia R. Liu Sheng

119

0

0

15 Aug 2025

NLP Methods May Actually Be Better Than Professors at Estimating Question Difficulty

NLP Methods May Actually Be Better Than Professors at Estimating Question Difficulty

Ivo Pascal de Jong

Matias Valdenegro-Toro

172

1

0

05 Aug 2025

Causes in neuron diagrams, and testing causal reasoning in Large Language Models. A glimpse of the future of philosophy?

Causes in neuron diagrams, and testing causal reasoning in Large Language Models. A glimpse of the future of philosophy?

Vitaly Nikolaev

166

0

0

17 Jun 2025

AbsenceBench: Language Models Can't Tell What's Missing

AbsenceBench: Language Models Can't Tell What's Missing

Harvey Yiyun Fu

Aryan Shrivastava

206

4

0

13 Jun 2025

MATP-BENCH: Can MLLM Be a Good Automated Theorem Prover for Multimodal Problems?

MATP-BENCH: Can MLLM Be a Good Automated Theorem Prover for Multimodal Problems?

235

7

0

06 Jun 2025

XToM: Exploring the Multilingual Theory of Mind for Large Language Models

XToM: Exploring the Multilingual Theory of Mind for Large Language Models

...

Hinrich Schütze

204

0

0

03 Jun 2025

Evaluation of LLMs for mathematical problem solving

Evaluation of LLMs for mathematical problem solving

Rohitash Chandra

399

2

0

30 May 2025

ASyMOB: Algebraic Symbolic Mathematical Operations Benchmark

ASyMOB: Algebraic Symbolic Mathematical Operations Benchmark

Rotem Elimelech

153

3

0

28 May 2025

Born a Transformer -- Always a Transformer? On the Effect of Pretraining on Architectural Abilities

Born a Transformer -- Always a Transformer? On the Effect of Pretraining on Architectural Abilities

Mayank Jobanputra

Aleksandra Bakalova

476

2

0

27 May 2025

Two Causally Related Needles in a Video Haystack

Two Causally Related Needles in a Video Haystack

Boyang Albert Li

308

0

0

26 May 2025

Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks

Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks

Debargha Ganguly

Sreehari Sankar

Srinivasan Iyengar

Shivkumar Kalyanaraman

Vipin Chaudhary

307

2

0

26 May 2025

Small Models, Smarter Learning: The Power of Joint Task Training

Small Models, Smarter Learning: The Power of Joint Task Training

Benjamin Hoover

Hendrik Strobelt

Daniel Karl I. Weidele

240

0

0

23 May 2025

AutoMathKG: The automated mathematical knowledge graph based on LLM and vector database

AutoMathKG: The automated mathematical knowledge graph based on LLM and vector databaseInternational Conference on Climate Informatics (ICCI), 2025

539

2

0

19 May 2025

From Recall to Reasoning: Automated Question Generation for Deeper Math Learning through Large Language Models

From Recall to Reasoning: Automated Question Generation for Deeper Math Learning through Large Language ModelsInternational Conference on Artificial Intelligence in Education (AIED), 2025

Alexandre Krantz

Nikki G. Lobczowski

198

2

0

17 May 2025

Combining Bayesian Inference and Reinforcement Learning for Agent Decision Making: A Review

Combining Bayesian Inference and Reinforcement Learning for Agent Decision Making: A Review

Laura Ruotsalainen

431

0

0

12 May 2025

Assessment of Evolving Large Language Models in Upper Secondary Mathematics

Assessment of Evolving Large Language Models in Upper Secondary Mathematics

Pieta Sikström

261

1

0

15 Apr 2025

Physics-informed KAN PointNet: Deep learning for simultaneous solutions to inverse problems in incompressible flow on numerous irregular geometries

Physics-informed KAN PointNet: Deep learning for simultaneous solutions to inverse problems in incompressible flow on numerous irregular geometries

363

5

0

08 Apr 2025

Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics

Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics

Alireza Hashemi

Pegah Mohammadipour

Alireza Farhadi

Yekta Yazdanifard

Amir Khasahmadi

442

16

0

01 Apr 2025

Integrating Artificial Intelligence with Human Expertise: An In-depth Analysis of ChatGPT's Capabilities in Generating Metamorphic Relations

Integrating Artificial Intelligence with Human Expertise: An In-depth Analysis of ChatGPT's Capabilities in Generating Metamorphic Relations

210

0

0

28 Mar 2025

Boosting Large Language Models with Mask Fine-Tuning

Boosting Large Language Models with Mask Fine-Tuning

239

2

0

27 Mar 2025

Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad

Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad

Jasper Dekoninck

Lyuben Baltadzhiev

Maria Drencheva

Kristian Minchev

Mislav Balunović

Nikola Jovanović

516

59

0

27 Mar 2025

ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation

ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation

Zongchuang Zhao

362

60

0

25 Mar 2025

MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems

MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems

280

6

0

19 Mar 2025

Unlocking Learning Potentials: The Transformative Effect of Generative AI in Education Across Grade Levels

Unlocking Learning Potentials: The Transformative Effect of Generative AI in Education Across Grade Levels

101

0

0

15 Mar 2025

Out-of-Context Reasoning in Large Language Models

Out-of-Context Reasoning in Large Language Models

Emanuele La Malfa

Michael Wooldridge

445

0

0

13 Mar 2025

Numerical Error Analysis of Large Language Models

Stanislav Budzinskiy

Philipp Petersen

225

2

0

13 Mar 2025

SciHorizon: Benchmarking AI-for-Science Readiness from Scientific Data to Large Language Models

SciHorizon: Benchmarking AI-for-Science Readiness from Scientific Data to Large Language Models

...

312

6

0

12 Mar 2025

Measure Twice, Cut Once: Grasping Video Structures and Event Semantics with LLMs for Video Temporal Localization

347

3

0

12 Mar 2025

AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning

343

46

0

10 Mar 2025

PiCO: Peer Review in LLMs based on the Consistency Optimization

PiCO: Peer Review in LLMs based on the Consistency Optimization

507

14

0

24 Feb 2025

Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents

Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents

Patrick Tser Jern Kon

Jayanth Srinivasa

Mosharaf Chowdhury

396

19

0

22 Feb 2025

Testing GPT-4 with Wolfram Alpha and Code Interpreter plug-ins on math and science problems

Testing GPT-4 with Wolfram Alpha and Code Interpreter plug-ins on math and science problems

384

27

0

21 Feb 2025

Page 1 of 5