Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2301.13867
Cited By
Mathematical Capabilities of ChatGPT
31 January 2023
Simon Frieder
Luca Pinchetti
Alexis Chevalier
Ryan-Rhys Griffiths
Tommaso Salvatori
Thomas Lukasiewicz
P. Petersen
Julius Berner
ELM
AI4MH
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Mathematical Capabilities of ChatGPT"
50 / 198 papers shown
Title
Combining Bayesian Inference and Reinforcement Learning for Agent Decision Making: A Review
Chengmin Zhou
Ville Kyrki
P. Fränti
Laura Ruotsalainen
BDL
AI4CE
27
0
0
12 May 2025
Mathematical Capabilities of Large Language Models in Finnish Matriculation Examination
Mika Setälä
Pieta Sikström
Ville Heilala
T. Karkkainen
ELM
LRM
23
1
0
15 Apr 2025
Physics-informed KAN PointNet: Deep learning for simultaneous solutions to inverse problems in incompressible flow on numerous irregular geometries
Ali Kashefi
T. Mukerji
3DPC
PINN
47
0
0
08 Apr 2025
Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics
Hamed Mahdavi
Alireza Hashemi
Majid Daliri
Pegah Mohammadipour
Alireza Farhadi
Samira Malek
Yekta Yazdanifard
Amir Khasahmadi
V. Honavar
ELM
LRM
52
1
0
01 Apr 2025
Integrating Artificial Intelligence with Human Expertise: An In-depth Analysis of ChatGPT's Capabilities in Generating Metamorphic Relations
Yifan Zhang
Dave Towey
Matthew Pike
Q. Luu
Huai Liu
T. Chen
29
0
0
28 Mar 2025
Boosting Large Language Models with Mask Fine-Tuning
M. Zhang
Yue Bai
Huan Wang
Yizhou Wang
Qihua Dong
Y. Fu
CLL
48
0
0
27 Mar 2025
Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad
Ivo Petrov
Jasper Dekoninck
Lyuben Baltadzhiev
Maria Drencheva
Kristian Minchev
Mislav Balunović
Nikola Jovanović
Martin Vechev
LRM
ELM
62
8
0
27 Mar 2025
ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation
Haoyu Fu
Diankun Zhang
Zongchuang Zhao
Jianfeng Cui
Dingkang Liang
Chong Zhang
Dingyuan Zhang
Hongwei Xie
Bing Wang
Xiang Bai
38
2
0
25 Mar 2025
MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems
Felix Chen
Hangjie Yuan
Yunqiu Xu
Tao Feng
Jun Cen
Pengwei Liu
Zeying Huang
Yi Yang
LRM
42
1
0
19 Mar 2025
Unlocking Learning Potentials: The Transformative Effect of Generative AI in Education Across Grade Levels
Meijuan Xie
Liling Luo
39
0
0
15 Mar 2025
Understanding the Logical Capabilities of Large Language Models via Out-of-Context Representation Learning
Jonathan Shaki
Emanuele La Malfa
Michael Wooldridge
Sarit Kraus
LRM
ReLM
64
0
0
13 Mar 2025
Numerical Error Analysis of Large Language Models
Stanislav Budzinskiy
Wenyi Fang
Longbin Zeng
Philipp Petersen
37
1
0
13 Mar 2025
Unveiling the Mathematical Reasoning in DeepSeek Models: A Comparative Study of Large Language Models
Afrar Jahin
Arif Hassan Zidan
Yu Bao
Shizhe Liang
T. Liu
W. Zhang
LRM
61
1
0
13 Mar 2025
SciHorizon: Benchmarking AI-for-Science Readiness from Scientific Data to Large Language Models
Chuan Qin
X. Chen
Chengrui Wang
Pengmin Wu
Xi Chen
...
Han Wu
C. Li
Yuanchun Zhou
H. Xiong
Hengshu Zhu
ELM
57
1
0
12 Mar 2025
Measure Twice, Cut Once: Grasping Video Structures and Event Semantics with LLMs for Video Temporal Localization
Zongshang Pang
Mayu Otani
Yuta Nakashima
51
0
0
12 Mar 2025
AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning
Bo Jiang
Shaoyu Chen
Qian Zhang
Wenyu Liu
Xinggang Wang
OffRL
LRM
VLM
71
2
0
10 Mar 2025
PiCO: Peer Review in LLMs based on the Consistency Optimization
Kun-Peng Ning
Shuo Yang
Yu-Yang Liu
Jia-Yu Yao
Zhen-Hui Liu
Yu Wang
Ming Pang
Li Yuan
ALM
69
8
0
24 Feb 2025
Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents
Patrick Tser Jern Kon
Jiachen Liu
Qiuyi Ding
Yiming Qiu
Zhenning Yang
Yibo Huang
Jayanth Srinivasa
Myungjin Lee
Mosharaf Chowdhury
Ang Chen
51
3
0
22 Feb 2025
Testing GPT-4 with Wolfram Alpha and Code Interpreter plug-ins on math and science problems
E. Davis
S. Aaronson
ELM
122
21
0
21 Feb 2025
On the robustness of ChatGPT in teaching Korean Mathematics
Phuong-Nam Nguyen
Quang Nguyen-The
An Vu-Minh
Diep-Anh Nguyen
Xuan-Lam Pham
RALM
37
0
0
17 Feb 2025
Selective Response Strategies for GenAI
Boaz Taitler
Omer Ben-Porat
66
1
0
02 Feb 2025
Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise
Rose E. Wang
Ana T. Ribeiro
Carly Robinson
Susanna Loeb
Dora Demszky
60
11
0
28 Jan 2025
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
Haipeng Luo
Qingfeng Sun
Can Xu
Pu Zhao
Jian-Guang Lou
...
Xiubo Geng
Qingwei Lin
Shifeng Chen
Yansong Tang
Dongmei Zhang
OSLM
LRM
108
406
0
03 Jan 2025
Formal Mathematical Reasoning: A New Frontier in AI
Kaiyu Yang
Gabriel Poesia
Jingxuan He
Wenda Li
Kristin Lauter
Swarat Chaudhuri
Dawn Song
LRM
AI4CE
82
20
0
20 Dec 2024
INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge
Angelika Romanou
Negar Foroutan
Anna Sotnikova
Zeming Chen
Sree Harsha Nelaturu
...
Mike Zhang
Imanol Schlag
Marzieh Fadaee
Sara Hooker
Antoine Bosselut
ELM
105
6
0
29 Nov 2024
Embracing AI in Education: Understanding the Surge in Large Language Model Use by Secondary Students
Tiffany Zhu
Kexun Zhang
William Yang Wang
SyDa
ELM
AI4Ed
62
3
0
27 Nov 2024
ChatGPT in Research and Education: Exploring Benefits and Threats
Abu Saleh Musa Miah
Md Mahbubur Rahman Tusher
Md. Moazzem Hossain
Md Mamun Hossain
M. Rahim
Md Ekramul Hamid
M. Islam
Jungpil Shin
24
1
0
05 Nov 2024
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving
Bo Jiang
Shaoyu Chen
Bencheng Liao
Xingyu Zhang
Wei Yin
Qian Zhang
Chang Huang
W. Liu
X. Wang
VLM
MLLM
LRM
35
12
0
29 Oct 2024
NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual Updates
Hexuan Deng
Wenxiang Jiao
Xuebo Liu
Min Zhang
Zhaopeng Tu
36
2
0
28 Oct 2024
Interchangeable Token Embeddings for Extendable Vocabulary and Alpha-Equivalence
İlker Işık
R. G. Cinbis
Ebru Aydin Gol
26
0
0
22 Oct 2024
An Automatic and Cost-Efficient Peer-Review Framework for Language Generation Evaluation
Junjie Chen
Weihang Su
Zhumin Chu
Haitao Li
Qinyao Ai
Yiqun Liu
Min Zhang
Shaoping Ma
29
3
0
16 Oct 2024
QUITE: Quantifying Uncertainty in Natural Language Text in Bayesian Reasoning Scenarios
Timo Pierre Schrader
Lukas Lange
Simon Razniewski
Annemarie Friedrich
UQLM
25
0
0
14 Oct 2024
HARDMath: A Benchmark Dataset for Challenging Problems in Applied Mathematics
Jingxuan Fan
Sarah Martinson
Erik Y. Wang
Kaylie Hausknecht
Jonah Brenner
Danxian Liu
Nianli Peng
Corey Wang
Michael P. Brenner
26
5
0
13 Oct 2024
Low-Dimension-to-High-Dimension Generalization And Its Implications for Length Generalization
Yang Chen
Yitao Liang
Zhouchen Lin
21
1
0
11 Oct 2024
MaD-Scientist: AI-based Scientist solving Convection-Diffusion-Reaction Equations Using Massive PINN-Based Prior Data
Mingu Kang
Dongseok Lee
Woojin Cho
Jaehyeon Park
Kookjin Lee
Anthony Gruber
Youngjoon Hong
Noseong Park
DiffM
AI4CE
29
0
0
09 Oct 2024
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs
Lei Wang
Shan Dong
Yuhui Xu
Hanze Dong
Yalu Wang
Amrita Saha
Ee-Peng Lim
Caiming Xiong
Doyen Sahoo
LRM
40
1
0
07 Oct 2024
Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark
Himanshu Gupta
Shreyas Verma
Ujjwala Anantheswaran
Kevin Scaria
Mihir Parmar
Swaroop Mishra
Chitta Baral
ReLM
LRM
24
4
0
06 Oct 2024
Persona Knowledge-Aligned Prompt Tuning Method for Online Debate
Chunkit Chan
Cheng Jiayang
Xin Liu
Yauwai Yim
Yuxin Jiang
Zheye Deng
Haoran Li
Yangqiu Song
Ginny Y. Wong
Simon See
34
0
0
05 Oct 2024
ECon: On the Detection and Resolution of Evidence Conflicts
Cheng Jiayang
Chunkit Chan
Qianqian Zhuang
Lin Qiu
Tianhang Zhang
Tengxiao Liu
Yangqiu Song
Yue Zhang
Pengfei Liu
Zheng Zhang
36
1
0
05 Oct 2024
Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
Haibo Wang
Zhiyang Xu
Yu Cheng
Shizhe Diao
Yufan Zhou
Yixin Cao
Qifan Wang
Weifeng Ge
Lifu Huang
22
20
0
04 Oct 2024
Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and Generalization
Mucong Ding
Chenghao Deng
Jocelyn Choo
Zichu Wu
Aakriti Agrawal
...
Tianyi Zhou
Tom Goldstein
John Langford
Anima Anandkumar
Furong Huang
51
5
0
27 Sep 2024
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
Ye Liu
Zongyang Ma
Zhongang Qi
Yang Wu
Ying Shan
Chang Wen Chen
31
16
0
26 Sep 2024
Constrained Reasoning Chains for Enhancing Theory-of-Mind in Large Language Models
Zizheng Lin
Chunkit Chan
Yangqiu Song
Xin Liu
LRM
26
1
0
20 Sep 2024
System 2 thinking in OpenAI's o1-preview model: Near-perfect performance on a mathematics exam
J. D. Winter
Dimitra Dodou
Y. B. Eisma
VLM
ELM
LRM
ReLM
27
9
0
19 Sep 2024
Linguini: A benchmark for language-agnostic linguistic reasoning
Eduardo Sánchez
Belen Alastruey
C. Ropers
Pontus Stenetorp
Mikel Artetxe
Marta R. Costa-jussá
ReLM
ELM
LRM
42
6
0
18 Sep 2024
Large Language Models in Drug Discovery and Development: From Disease Mechanisms to Clinical Trials
Yizhen Zheng
Huan Yee Koh
M. Yang
Li Li
Lauren T. May
Geoffrey I. Webb
Shirui Pan
George Church
LM&MA
42
9
0
06 Sep 2024
Interpreting and Improving Large Language Models in Arithmetic Calculation
Wei Zhang
Chaoqun Wan
Yonggang Zhang
Yiu-ming Cheung
Xinmei Tian
Xu Shen
Jieping Ye
LRM
24
18
0
03 Sep 2024
iToT: An Interactive System for Customized Tree-of-Thought Generation
Alan Boyle
Isha Gupta
Sebastian Hönig
Lukas Mautner
Kenza Amara
Furui Cheng
Mennatallah El-Assady
LRM
LM&Ro
32
1
0
31 Aug 2024
Do GPT Language Models Suffer From Split Personality Disorder? The Advent Of Substrate-Free Psychometrics
P. Romero
Stephen Fitz
T. Nakatsuma
25
10
0
14 Aug 2024
Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information
Yauwai Yim
Chunkit Chan
Tianyu Shi
Zheye Deng
Wei Fan
Tianshi Zheng
Yangqiu Song
LLMAG
23
9
0
05 Aug 2024
1
2
3
4
Next