ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.13867
  4. Cited By
Mathematical Capabilities of ChatGPT
v1v2 (latest)

Mathematical Capabilities of ChatGPT

Neural Information Processing Systems (NeurIPS), 2023
31 January 2023
Simon Frieder
Luca Pinchetti
Alexis Chevalier
Ryan-Rhys Griffiths
Tommaso Salvatori
Thomas Lukasiewicz
P. Petersen
Julius Berner
    ELMAI4MH
ArXiv (abs)PDFHTML

Papers citing "Mathematical Capabilities of ChatGPT"

50 / 227 papers shown
On the robustness of ChatGPT in teaching Korean Mathematics
On the robustness of ChatGPT in teaching Korean Mathematics
Phuong-Nam Nguyen
Quang Nguyen-The
An Vu-Minh
Diep-Anh Nguyen
Xuan-Lam Pham
RALM
133
1
0
17 Feb 2025
Selective Response Strategies for GenAI
Selective Response Strategies for GenAI
Boaz Taitler
Omer Ben-Porat
356
5
0
02 Feb 2025
Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise
Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise
Rose E. Wang
Ana T. Ribeiro
Carly Robinson
Susanna Loeb
Dora Demszky
369
40
0
28 Jan 2025
ChartInsighter: An Approach for Mitigating Hallucination in Time-series Chart Summary Generation with A Benchmark Dataset
ChartInsighter: An Approach for Mitigating Hallucination in Time-series Chart Summary Generation with A Benchmark DatasetIEEE Transactions on Visualization and Computer Graphics (TVCG), 2025
Fen Wang
Bomiao Wang
Xueli Shu
Zhen Liu
Zekai Shao
Chao Liu
Siming Chen
AI4TS
223
9
0
17 Jan 2025
Formal Mathematical Reasoning: A New Frontier in AI
Formal Mathematical Reasoning: A New Frontier in AI
Kaiyu Yang
Gabriel Poesia
Jingxuan He
Wenda Li
Kristin Lauter
Swarat Chaudhuri
Dawn Song
LRMAI4CE
402
66
0
20 Dec 2024
INCLUDE: Evaluating Multilingual Language Understanding with Regional
  Knowledge
INCLUDE: Evaluating Multilingual Language Understanding with Regional KnowledgeInternational Conference on Learning Representations (ICLR), 2024
Angelika Romanou
Negar Foroutan
Anna Sotnikova
Zeming Chen
Sree Harsha Nelaturu
...
Mike Zhang
Imanol Schlag
Marzieh Fadaee
Sara Hooker
Antoine Bosselut
ELM
406
31
0
29 Nov 2024
Embracing AI in Education: Understanding the Surge in Large Language
  Model Use by Secondary Students
Embracing AI in Education: Understanding the Surge in Large Language Model Use by Secondary Students
Tiffany Zhu
Kexun Zhang
William Yang Wang
SyDaELMAI4Ed
194
6
0
27 Nov 2024
ChatGPT in Research and Education: Exploring Benefits and Threats
ChatGPT in Research and Education: Exploring Benefits and Threats
Abu Saleh Musa Miah
Md Mahbubur Rahman Tusher
Md. Moazzem Hossain
Md Mamun Hossain
M. Rahim
Md Ekramul Hamid
M. Islam
Jungpil Shin
110
2
0
05 Nov 2024
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous
  Driving
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving
Bo Jiang
Shaoyu Chen
Bencheng Liao
Xingyu Zhang
Wei Yin
Qian Zhang
Chang Huang
Wen Liu
Xinyu Wang
VLMMLLMLRM
307
77
0
29 Oct 2024
NewTerm: Benchmarking Real-Time New Terms for Large Language Models with
  Annual Updates
NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual UpdatesNeural Information Processing Systems (NeurIPS), 2024
Hexuan Deng
Wenxiang Jiao
Xuebo Liu
Min Zhang
Zhaopeng Tu
253
7
0
28 Oct 2024
Interchangeable Token Embeddings for Extendable Vocabulary and Alpha-Equivalence
Interchangeable Token Embeddings for Extendable Vocabulary and Alpha-Equivalence
İlker Işık
R. G. Cinbis
Ebru Aydin Gol
434
0
0
22 Oct 2024
Auto-PRE: An Automatic and Cost-Efficient Peer-Review Framework for Language Generation Evaluation
Auto-PRE: An Automatic and Cost-Efficient Peer-Review Framework for Language Generation Evaluation
Junjie Chen
Weihang Su
Zhumin Chu
Haitao Li
Qinyao Ai
...
Jun Zhou
Y. Liu
Min Zhang
Shaoping Ma
Qingyao Ai
195
5
0
16 Oct 2024
QUITE: Quantifying Uncertainty in Natural Language Text in Bayesian
  Reasoning Scenarios
QUITE: Quantifying Uncertainty in Natural Language Text in Bayesian Reasoning ScenariosConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Timo Pierre Schrader
Lukas Lange
Simon Razniewski
Annemarie Friedrich
UQLM
313
4
0
14 Oct 2024
HARDMath: A Benchmark Dataset for Challenging Problems in Applied
  Mathematics
HARDMath: A Benchmark Dataset for Challenging Problems in Applied MathematicsInternational Conference on Learning Representations (ICLR), 2024
Jingxuan Fan
Sarah Martinson
Erik Y. Wang
Kaylie Hausknecht
Jonah Brenner
Danxian Liu
Nianli Peng
Corey Wang
Michael P. Brenner
136
24
0
13 Oct 2024
Low-Dimension-to-High-Dimension Generalization And Its Implications for Length Generalization
Low-Dimension-to-High-Dimension Generalization And Its Implications for Length Generalization
Yang Chen
Long Yang
Yitao Liang
Zhouchen Lin
358
2
0
11 Oct 2024
MaD-Scientist: AI-based Scientist solving Convection-Diffusion-Reaction
  Equations Using Massive PINN-Based Prior Data
MaD-Scientist: AI-based Scientist solving Convection-Diffusion-Reaction Equations Using Massive PINN-Based Prior Data
Mingu Kang
Dongseok Lee
Woojin Cho
Jaehyeon Park
Kookjin Lee
Anthony Gruber
Youngjoon Hong
Noseong Park
DiffMAI4CE
198
1
0
09 Oct 2024
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning
  in LLMs
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs
Lei Wang
Shan Dong
Yuhui Xu
Hanze Dong
Yalu Wang
Amrita Saha
Ee-Peng Lim
Caiming Xiong
Doyen Sahoo
LRM
135
5
0
07 Oct 2024
Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark
Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark
Himanshu Gupta
Shreyas Verma
Ujjwala Anantheswaran
Kevin Scaria
Mihir Parmar
Swaroop Mishra
Chitta Baral
ReLMLRM
257
19
0
06 Oct 2024
Persona Knowledge-Aligned Prompt Tuning Method for Online Debate
Persona Knowledge-Aligned Prompt Tuning Method for Online Debate
Chunkit Chan
Cheng Jiayang
Xin Liu
Yauwai Yim
Yuxin Jiang
Zheye Deng
Haoran Li
Yangqiu Song
Ginny Wong
Simon See
296
0
0
05 Oct 2024
ECon: On the Detection and Resolution of Evidence Conflicts
ECon: On the Detection and Resolution of Evidence ConflictsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Cheng Jiayang
Chunkit Chan
Qianqian Zhuang
Lin Qiu
Tianhang Zhang
Tengxiao Liu
Yangqiu Song
Yue Zhang
Pengfei Liu
Zheng Zhang
260
13
0
05 Oct 2024
Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
Haibo Wang
Zhiyang Xu
Yu Cheng
Shizhe Diao
Jiuxiang Gu
Yixin Cao
Qifan Wang
Weifeng Ge
Lifu Huang
262
55
0
04 Oct 2024
Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and Generalization
Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and GeneralizationNeural Information Processing Systems (NeurIPS), 2024
Mucong Ding
Chenghao Deng
Jocelyn Choo
Zichu Wu
Aakriti Agrawal
...
Wanrong Zhu
Tom Goldstein
John Langford
Anima Anandkumar
Furong Huang
341
7
0
27 Sep 2024
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
E.T. Bench: Towards Open-Ended Event-Level Video-Language UnderstandingNeural Information Processing Systems (NeurIPS), 2024
Ye Liu
Zongyang Ma
Chen Ma
Yang Wu
Ying Shan
Chang Wen Chen
267
52
0
26 Sep 2024
Constrained Reasoning Chains for Enhancing Theory-of-Mind in Large
  Language Models
Constrained Reasoning Chains for Enhancing Theory-of-Mind in Large Language ModelsPacific Rim International Conference on Artificial Intelligence (PRICAI), 2024
Zizheng Lin
Chunkit Chan
Yangqiu Song
Xin Liu
LRM
244
3
0
20 Sep 2024
System 2 thinking in OpenAI's o1-preview model: Near-perfect performance
  on a mathematics exam
System 2 thinking in OpenAI's o1-preview model: Near-perfect performance on a mathematics examDe Computis (DC), 2024
J. D. Winter
Dimitra Dodou
Y. B. Eisma
VLMELMLRMReLM
300
20
0
19 Sep 2024
Linguini: A benchmark for language-agnostic linguistic reasoning
Linguini: A benchmark for language-agnostic linguistic reasoning
Eduardo Sánchez
Belen Alastruey
C. Ropers
Pontus Stenetorp
Mikel Artetxe
Marta R. Costa-jussá
ReLMELMLRM
276
12
0
18 Sep 2024
Large Language Models in Drug Discovery and Development: From Disease
  Mechanisms to Clinical Trials
Large Language Models in Drug Discovery and Development: From Disease Mechanisms to Clinical Trials
Yizhen Zheng
Huan Yee Koh
M. Yang
Li Li
Lauren T. May
Geoffrey I. Webb
Shirui Pan
George Church
LM&MA
233
46
0
06 Sep 2024
Interpreting and Improving Large Language Models in Arithmetic
  Calculation
Interpreting and Improving Large Language Models in Arithmetic CalculationInternational Conference on Machine Learning (ICML), 2024
Wei Zhang
Chaoqun Wan
Yonggang Zhang
Yiu-ming Cheung
Xinmei Tian
Xu Shen
Jieping Ye
LRM
323
36
0
03 Sep 2024
iToT: An Interactive System for Customized Tree-of-Thought Generation
iToT: An Interactive System for Customized Tree-of-Thought Generation
Alan Boyle
Isha Gupta
Sebastian Hönig
Lukas Mautner
Kenza Amara
Furui Cheng
Mennatallah El-Assady
LRMLM&Ro
191
2
0
31 Aug 2024
Do GPT Language Models Suffer From Split Personality Disorder? The
  Advent Of Substrate-Free Psychometrics
Do GPT Language Models Suffer From Split Personality Disorder? The Advent Of Substrate-Free Psychometrics
P. Romero
Stephen Fitz
T. Nakatsuma
141
12
0
14 Aug 2024
Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan:
  A Multi-Player Cooperative Game under Imperfect Information
Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information
Yauwai Yim
Chunkit Chan
Tianyu Shi
Zheye Deng
Wei Fan
Tianshi Zheng
Yangqiu Song
LLMAG
302
20
0
05 Aug 2024
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical
  Reasoning with Checklist
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist
Zihao Zhou
Shudong Liu
Maizhen Ning
Wei Liu
Jindong Wang
Yang Li
Xiaowei Huang
Qiufeng Wang
Kaizhu Huang
ELMLRM
243
45
0
11 Jul 2024
From Data to Commonsense Reasoning: The Use of Large Language Models for
  Explainable AI
From Data to Commonsense Reasoning: The Use of Large Language Models for Explainable AI
Stefanie Krause
Frieder Stolzenburg
ELMLRM
219
4
0
04 Jul 2024
MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large
  Language Models Using Odyssey Math Data
MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data
Meng Fang
Xiangpeng Wan
Fei Lu
Fei Xing
Kai Zou
211
49
0
26 Jun 2024
A Moonshot for AI Oracles in the Sciences
A Moonshot for AI Oracles in the Sciences
Bryan Kaiser
Tailin Wu
Maike Sonnewald
Colin Thackray
Skylar Callis
AI4CE
202
1
0
25 Jun 2024
Modulating Language Model Experiences through Frictions
Modulating Language Model Experiences through Frictions
Katherine M. Collins
Valerie Chen
Ilia Sucholutsky
Hannah Rose Kirk
Malak Sadek
Holli Sargeant
Ameet Talwalkar
Adrian Weller
Umang Bhatt
KELM
210
9
0
24 Jun 2024
Évaluation des capacités de réponse de larges modèles de langage
  (LLM) pour des questions d'historiens
Évaluation des capacités de réponse de larges modèles de langage (LLM) pour des questions d'historiens
M. Chartier
Nabil Dakkoune
G. Bourgeois
Stéphane Jean
KELMELM
116
1
0
21 Jun 2024
Do Large Language Models Exhibit Cognitive Dissonance? Studying the Difference Between Revealed Beliefs and Stated Answers
Do Large Language Models Exhibit Cognitive Dissonance? Studying the Difference Between Revealed Beliefs and Stated Answers
Manuel Mondal
Ljiljana Dolamic
Gérôme Bovet
Philippe Cudré-Mauroux
Julien Audiffren
453
4
0
21 Jun 2024
Relational Learning in Pre-Trained Models: A Theory from Hypergraph
  Recovery Perspective
Relational Learning in Pre-Trained Models: A Theory from Hypergraph Recovery Perspective
Yang Chen
Cong Fang
Zhouchen Lin
Bing Liu
221
2
0
17 Jun 2024
Pre-trained Large Language Models Use Fourier Features to Compute
  Addition
Pre-trained Large Language Models Use Fourier Features to Compute Addition
Tianyi Zhou
Deqing Fu
Willie Neiswanger
Robin Jia
LRM
263
29
0
05 Jun 2024
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models
Marianna Nezhurina
Lucia Cipolina-Kun
Mehdi Cherti
J. Jitsev
LLMAGLRMELMReLM
813
58
0
04 Jun 2024
Applying Fine-Tuned LLMs for Reducing Data Needs in Load Profile
  Analysis
Applying Fine-Tuned LLMs for Reducing Data Needs in Load Profile Analysis
Yi Hu
Hyeonjin Kim
Kai Ye
Ning Lu
215
18
0
02 Jun 2024
Evaluating Mathematical Reasoning of Large Language Models: A Focus on
  Error Identification and Correction
Evaluating Mathematical Reasoning of Large Language Models: A Focus on Error Identification and Correction
Xiaoyuan Li
Wenjie Wang
Moxin Li
Junrong Guo
Yang Zhang
Fuli Feng
ELMLRM
234
41
0
02 Jun 2024
Models That Prove Their Own Correctness
Models That Prove Their Own Correctness
Noga Amit
S. Goldwasser
Orr Paradise
G. Rothblum
LRM
448
5
0
24 May 2024
Investigating Symbolic Capabilities of Large Language Models
Investigating Symbolic Capabilities of Large Language Models
Neisarg Dave
Daniel Kifer
C. Lee Giles
A. Mali
ELMLRM
167
4
0
21 May 2024
Can formal argumentative reasoning enhance LLMs performances?
Can formal argumentative reasoning enhance LLMs performances?
Federico Castagna
I. Sassoon
Simon Parsons
LRMLLMAG
138
3
0
16 May 2024
Exploring the Impact of ChatGPT on Wikipedia Engagement
Exploring the Impact of ChatGPT on Wikipedia Engagement
Neal Reeves
Wenjie Yin
Elena Simperl
KELM
195
7
0
16 May 2024
The AI Companion in Education: Analyzing the Pedagogical Potential of
  ChatGPT in Computer Science and Engineering
The AI Companion in Education: Analyzing the Pedagogical Potential of ChatGPT in Computer Science and Engineering
Z. He
Thomas Nguyen
Tahereh Miari
Mehrdad Aliasgari
S. Rafatirad
Hossein Sayadi
107
7
0
23 Apr 2024
NegotiationToM: A Benchmark for Stress-testing Machine Theory of Mind on
  Negotiation Surrounding
NegotiationToM: A Benchmark for Stress-testing Machine Theory of Mind on Negotiation Surrounding
Chunkit Chan
Cheng Jiayang
Yauwai Yim
Zheye Deng
Wei Fan
Haoran Li
Xin Liu
Hongming Zhang
Weiqi Wang
Yangqiu Song
LLMAG
236
38
0
21 Apr 2024
Large Language Models as Test Case Generators: Performance Evaluation
  and Enhancement
Large Language Models as Test Case Generators: Performance Evaluation and Enhancement
Ke-Shen Li
Shijie Cao
LLMAG
180
40
0
20 Apr 2024
Previous
12345
Next