Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2301.13867
Cited By
v1
v2 (latest)
Mathematical Capabilities of ChatGPT
Neural Information Processing Systems (NeurIPS), 2023
31 January 2023
Simon Frieder
Luca Pinchetti
Alexis Chevalier
Ryan-Rhys Griffiths
Tommaso Salvatori
Thomas Lukasiewicz
P. Petersen
Julius Berner
ELM
AI4MH
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Mathematical Capabilities of ChatGPT"
50 / 227 papers shown
On the robustness of ChatGPT in teaching Korean Mathematics
Phuong-Nam Nguyen
Quang Nguyen-The
An Vu-Minh
Diep-Anh Nguyen
Xuan-Lam Pham
RALM
133
1
0
17 Feb 2025
Selective Response Strategies for GenAI
Boaz Taitler
Omer Ben-Porat
356
5
0
02 Feb 2025
Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise
Rose E. Wang
Ana T. Ribeiro
Carly Robinson
Susanna Loeb
Dora Demszky
369
40
0
28 Jan 2025
ChartInsighter: An Approach for Mitigating Hallucination in Time-series Chart Summary Generation with A Benchmark Dataset
IEEE Transactions on Visualization and Computer Graphics (TVCG), 2025
Fen Wang
Bomiao Wang
Xueli Shu
Zhen Liu
Zekai Shao
Chao Liu
Siming Chen
AI4TS
223
9
0
17 Jan 2025
Formal Mathematical Reasoning: A New Frontier in AI
Kaiyu Yang
Gabriel Poesia
Jingxuan He
Wenda Li
Kristin Lauter
Swarat Chaudhuri
Dawn Song
LRM
AI4CE
402
66
0
20 Dec 2024
INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge
International Conference on Learning Representations (ICLR), 2024
Angelika Romanou
Negar Foroutan
Anna Sotnikova
Zeming Chen
Sree Harsha Nelaturu
...
Mike Zhang
Imanol Schlag
Marzieh Fadaee
Sara Hooker
Antoine Bosselut
ELM
406
31
0
29 Nov 2024
Embracing AI in Education: Understanding the Surge in Large Language Model Use by Secondary Students
Tiffany Zhu
Kexun Zhang
William Yang Wang
SyDa
ELM
AI4Ed
194
6
0
27 Nov 2024
ChatGPT in Research and Education: Exploring Benefits and Threats
Abu Saleh Musa Miah
Md Mahbubur Rahman Tusher
Md. Moazzem Hossain
Md Mamun Hossain
M. Rahim
Md Ekramul Hamid
M. Islam
Jungpil Shin
110
2
0
05 Nov 2024
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving
Bo Jiang
Shaoyu Chen
Bencheng Liao
Xingyu Zhang
Wei Yin
Qian Zhang
Chang Huang
Wen Liu
Xinyu Wang
VLM
MLLM
LRM
307
77
0
29 Oct 2024
NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual Updates
Neural Information Processing Systems (NeurIPS), 2024
Hexuan Deng
Wenxiang Jiao
Xuebo Liu
Min Zhang
Zhaopeng Tu
253
7
0
28 Oct 2024
Interchangeable Token Embeddings for Extendable Vocabulary and Alpha-Equivalence
İlker Işık
R. G. Cinbis
Ebru Aydin Gol
434
0
0
22 Oct 2024
Auto-PRE: An Automatic and Cost-Efficient Peer-Review Framework for Language Generation Evaluation
Junjie Chen
Weihang Su
Zhumin Chu
Haitao Li
Qinyao Ai
...
Jun Zhou
Y. Liu
Min Zhang
Shaoping Ma
Qingyao Ai
195
5
0
16 Oct 2024
QUITE: Quantifying Uncertainty in Natural Language Text in Bayesian Reasoning Scenarios
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Timo Pierre Schrader
Lukas Lange
Simon Razniewski
Annemarie Friedrich
UQLM
313
4
0
14 Oct 2024
HARDMath: A Benchmark Dataset for Challenging Problems in Applied Mathematics
International Conference on Learning Representations (ICLR), 2024
Jingxuan Fan
Sarah Martinson
Erik Y. Wang
Kaylie Hausknecht
Jonah Brenner
Danxian Liu
Nianli Peng
Corey Wang
Michael P. Brenner
136
24
0
13 Oct 2024
Low-Dimension-to-High-Dimension Generalization And Its Implications for Length Generalization
Yang Chen
Long Yang
Yitao Liang
Zhouchen Lin
358
2
0
11 Oct 2024
MaD-Scientist: AI-based Scientist solving Convection-Diffusion-Reaction Equations Using Massive PINN-Based Prior Data
Mingu Kang
Dongseok Lee
Woojin Cho
Jaehyeon Park
Kookjin Lee
Anthony Gruber
Youngjoon Hong
Noseong Park
DiffM
AI4CE
198
1
0
09 Oct 2024
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs
Lei Wang
Shan Dong
Yuhui Xu
Hanze Dong
Yalu Wang
Amrita Saha
Ee-Peng Lim
Caiming Xiong
Doyen Sahoo
LRM
135
5
0
07 Oct 2024
Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark
Himanshu Gupta
Shreyas Verma
Ujjwala Anantheswaran
Kevin Scaria
Mihir Parmar
Swaroop Mishra
Chitta Baral
ReLM
LRM
257
19
0
06 Oct 2024
Persona Knowledge-Aligned Prompt Tuning Method for Online Debate
Chunkit Chan
Cheng Jiayang
Xin Liu
Yauwai Yim
Yuxin Jiang
Zheye Deng
Haoran Li
Yangqiu Song
Ginny Wong
Simon See
296
0
0
05 Oct 2024
ECon: On the Detection and Resolution of Evidence Conflicts
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Cheng Jiayang
Chunkit Chan
Qianqian Zhuang
Lin Qiu
Tianhang Zhang
Tengxiao Liu
Yangqiu Song
Yue Zhang
Pengfei Liu
Zheng Zhang
260
13
0
05 Oct 2024
Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
Haibo Wang
Zhiyang Xu
Yu Cheng
Shizhe Diao
Jiuxiang Gu
Yixin Cao
Qifan Wang
Weifeng Ge
Lifu Huang
262
55
0
04 Oct 2024
Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and Generalization
Neural Information Processing Systems (NeurIPS), 2024
Mucong Ding
Chenghao Deng
Jocelyn Choo
Zichu Wu
Aakriti Agrawal
...
Wanrong Zhu
Tom Goldstein
John Langford
Anima Anandkumar
Furong Huang
341
7
0
27 Sep 2024
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
Neural Information Processing Systems (NeurIPS), 2024
Ye Liu
Zongyang Ma
Chen Ma
Yang Wu
Ying Shan
Chang Wen Chen
267
52
0
26 Sep 2024
Constrained Reasoning Chains for Enhancing Theory-of-Mind in Large Language Models
Pacific Rim International Conference on Artificial Intelligence (PRICAI), 2024
Zizheng Lin
Chunkit Chan
Yangqiu Song
Xin Liu
LRM
244
3
0
20 Sep 2024
System 2 thinking in OpenAI's o1-preview model: Near-perfect performance on a mathematics exam
De Computis (DC), 2024
J. D. Winter
Dimitra Dodou
Y. B. Eisma
VLM
ELM
LRM
ReLM
300
20
0
19 Sep 2024
Linguini: A benchmark for language-agnostic linguistic reasoning
Eduardo Sánchez
Belen Alastruey
C. Ropers
Pontus Stenetorp
Mikel Artetxe
Marta R. Costa-jussá
ReLM
ELM
LRM
276
12
0
18 Sep 2024
Large Language Models in Drug Discovery and Development: From Disease Mechanisms to Clinical Trials
Yizhen Zheng
Huan Yee Koh
M. Yang
Li Li
Lauren T. May
Geoffrey I. Webb
Shirui Pan
George Church
LM&MA
233
46
0
06 Sep 2024
Interpreting and Improving Large Language Models in Arithmetic Calculation
International Conference on Machine Learning (ICML), 2024
Wei Zhang
Chaoqun Wan
Yonggang Zhang
Yiu-ming Cheung
Xinmei Tian
Xu Shen
Jieping Ye
LRM
323
36
0
03 Sep 2024
iToT: An Interactive System for Customized Tree-of-Thought Generation
Alan Boyle
Isha Gupta
Sebastian Hönig
Lukas Mautner
Kenza Amara
Furui Cheng
Mennatallah El-Assady
LRM
LM&Ro
191
2
0
31 Aug 2024
Do GPT Language Models Suffer From Split Personality Disorder? The Advent Of Substrate-Free Psychometrics
P. Romero
Stephen Fitz
T. Nakatsuma
141
12
0
14 Aug 2024
Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information
Yauwai Yim
Chunkit Chan
Tianyu Shi
Zheye Deng
Wei Fan
Tianshi Zheng
Yangqiu Song
LLMAG
302
20
0
05 Aug 2024
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist
Zihao Zhou
Shudong Liu
Maizhen Ning
Wei Liu
Jindong Wang
Yang Li
Xiaowei Huang
Qiufeng Wang
Kaizhu Huang
ELM
LRM
243
45
0
11 Jul 2024
From Data to Commonsense Reasoning: The Use of Large Language Models for Explainable AI
Stefanie Krause
Frieder Stolzenburg
ELM
LRM
219
4
0
04 Jul 2024
MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data
Meng Fang
Xiangpeng Wan
Fei Lu
Fei Xing
Kai Zou
211
49
0
26 Jun 2024
A Moonshot for AI Oracles in the Sciences
Bryan Kaiser
Tailin Wu
Maike Sonnewald
Colin Thackray
Skylar Callis
AI4CE
202
1
0
25 Jun 2024
Modulating Language Model Experiences through Frictions
Katherine M. Collins
Valerie Chen
Ilia Sucholutsky
Hannah Rose Kirk
Malak Sadek
Holli Sargeant
Ameet Talwalkar
Adrian Weller
Umang Bhatt
KELM
210
9
0
24 Jun 2024
Évaluation des capacités de réponse de larges modèles de langage (LLM) pour des questions d'historiens
M. Chartier
Nabil Dakkoune
G. Bourgeois
Stéphane Jean
KELM
ELM
116
1
0
21 Jun 2024
Do Large Language Models Exhibit Cognitive Dissonance? Studying the Difference Between Revealed Beliefs and Stated Answers
Manuel Mondal
Ljiljana Dolamic
Gérôme Bovet
Philippe Cudré-Mauroux
Julien Audiffren
453
4
0
21 Jun 2024
Relational Learning in Pre-Trained Models: A Theory from Hypergraph Recovery Perspective
Yang Chen
Cong Fang
Zhouchen Lin
Bing Liu
221
2
0
17 Jun 2024
Pre-trained Large Language Models Use Fourier Features to Compute Addition
Tianyi Zhou
Deqing Fu
Willie Neiswanger
Robin Jia
LRM
263
29
0
05 Jun 2024
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models
Marianna Nezhurina
Lucia Cipolina-Kun
Mehdi Cherti
J. Jitsev
LLMAG
LRM
ELM
ReLM
813
58
0
04 Jun 2024
Applying Fine-Tuned LLMs for Reducing Data Needs in Load Profile Analysis
Yi Hu
Hyeonjin Kim
Kai Ye
Ning Lu
215
18
0
02 Jun 2024
Evaluating Mathematical Reasoning of Large Language Models: A Focus on Error Identification and Correction
Xiaoyuan Li
Wenjie Wang
Moxin Li
Junrong Guo
Yang Zhang
Fuli Feng
ELM
LRM
234
41
0
02 Jun 2024
Models That Prove Their Own Correctness
Noga Amit
S. Goldwasser
Orr Paradise
G. Rothblum
LRM
448
5
0
24 May 2024
Investigating Symbolic Capabilities of Large Language Models
Neisarg Dave
Daniel Kifer
C. Lee Giles
A. Mali
ELM
LRM
167
4
0
21 May 2024
Can formal argumentative reasoning enhance LLMs performances?
Federico Castagna
I. Sassoon
Simon Parsons
LRM
LLMAG
138
3
0
16 May 2024
Exploring the Impact of ChatGPT on Wikipedia Engagement
Neal Reeves
Wenjie Yin
Elena Simperl
KELM
195
7
0
16 May 2024
The AI Companion in Education: Analyzing the Pedagogical Potential of ChatGPT in Computer Science and Engineering
Z. He
Thomas Nguyen
Tahereh Miari
Mehrdad Aliasgari
S. Rafatirad
Hossein Sayadi
107
7
0
23 Apr 2024
NegotiationToM: A Benchmark for Stress-testing Machine Theory of Mind on Negotiation Surrounding
Chunkit Chan
Cheng Jiayang
Yauwai Yim
Zheye Deng
Wei Fan
Haoran Li
Xin Liu
Hongming Zhang
Weiqi Wang
Yangqiu Song
LLMAG
236
38
0
21 Apr 2024
Large Language Models as Test Case Generators: Performance Evaluation and Enhancement
Ke-Shen Li
Shijie Cao
LLMAG
180
40
0
20 Apr 2024
Previous
1
2
3
4
5
Next