v1v2 (latest)

Mathematical Capabilities of ChatGPT

Neural Information Processing Systems (NeurIPS), 2023

31 January 2023

Papers citing "Mathematical Capabilities of ChatGPT"

50 / 227 papers shown

On the robustness of ChatGPT in teaching Korean Mathematics

133

17 Feb 2025

Selective Response Strategies for GenAI

Boaz Taitler

Omer Ben-Porat

376

02 Feb 2025

Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise

373

28 Jan 2025

ChartInsighter: An Approach for Mitigating Hallucination in Time-series Chart Summary Generation with A Benchmark DatasetIEEE Transactions on Visualization and Computer Graphics (TVCG), 2025

225

17 Jan 2025

Formal Mathematical Reasoning: A New Frontier in AI

402

20 Dec 2024

INCLUDE: Evaluating Multilingual Language Understanding with Regional KnowledgeInternational Conference on Learning Representations (ICLR), 2024

...

408

29 Nov 2024

Embracing AI in Education: Understanding the Surge in Large Language Model Use by Secondary Students

194

27 Nov 2024

ChatGPT in Research and Education: Exploring Benefits and Threats

Abu Saleh Musa Miah

Md Mahbubur Rahman Tusher

110

05 Nov 2024

Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving

Chang Huang

308

29 Oct 2024

NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual UpdatesNeural Information Processing Systems (NeurIPS), 2024

253

28 Oct 2024

Interchangeable Token Embeddings for Extendable Vocabulary and Alpha-Equivalence

İlker Işık

R. G. Cinbis

Ebru Aydin Gol

435

22 Oct 2024

Auto-PRE: An Automatic and Cost-Efficient Peer-Review Framework for Language Generation Evaluation

...

198

16 Oct 2024

QUITE: Quantifying Uncertainty in Natural Language Text in Bayesian Reasoning ScenariosConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

318

14 Oct 2024

HARDMath: A Benchmark Dataset for Challenging Problems in Applied MathematicsInternational Conference on Learning Representations (ICLR), 2024

137

13 Oct 2024

Low-Dimension-to-High-Dimension Generalization And Its Implications for Length Generalization

359

11 Oct 2024

MaD-Scientist: AI-based Scientist solving Convection-Diffusion-Reaction Equations Using Massive PINN-Based Prior Data

Anthony Gruber

198

09 Oct 2024

MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs

Lei Wang

Hanze Dong

Caiming Xiong

135

07 Oct 2024

Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark

Himanshu Gupta

Shreyas Verma

Ujjwala Anantheswaran

Swaroop Mishra

257

06 Oct 2024

Persona Knowledge-Aligned Prompt Tuning Method for Online Debate

Haoran Li

Yangqiu Song

Ginny Wong

Simon See

296

05 Oct 2024

ECon: On the Detection and Resolution of Evidence ConflictsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Tengxiao Liu

Yangqiu Song

Yue Zhang

Pengfei Liu

Zheng Zhang

260

05 Oct 2024

Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models

262

04 Oct 2024

Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and GeneralizationNeural Information Processing Systems (NeurIPS), 2024

...

341

27 Sep 2024

E.T. Bench: Towards Open-Ended Event-Level Video-Language UnderstandingNeural Information Processing Systems (NeurIPS), 2024

Ye Liu

Zongyang Ma

Chen Ma

Yang Wu

Ying Shan

Chang Wen Chen

273

26 Sep 2024

Constrained Reasoning Chains for Enhancing Theory-of-Mind in Large Language ModelsPacific Rim International Conference on Artificial Intelligence (PRICAI), 2024

Yangqiu Song

244

20 Sep 2024

System 2 thinking in OpenAI's o1-preview model: Near-perfect performance on a mathematics examDe Computis (DC), 2024

302

19 Sep 2024

Linguini: A benchmark for language-agnostic linguistic reasoning

276

18 Sep 2024

Large Language Models in Drug Discovery and Development: From Disease Mechanisms to Clinical Trials

Yizhen Zheng

Geoffrey I. Webb

238

06 Sep 2024

Interpreting and Improving Large Language Models in Arithmetic CalculationInternational Conference on Machine Learning (ICML), 2024

Wei Zhang

Chaoqun Wan

Yonggang Zhang

Yiu-ming Cheung

Xinmei Tian

Xu Shen

Jieping Ye

LRM

332

03 Sep 2024

iToT: An Interactive System for Customized Tree-of-Thought Generation

Mennatallah El-Assady

LRM LM&Ro

191

31 Aug 2024

Do GPT Language Models Suffer From Split Personality Disorder? The Advent Of Substrate-Free Psychometrics

P. Romero

Stephen Fitz

T. Nakatsuma

143

14 Aug 2024

Evaluating and Enhancing LLMs Agent based on Theory of Mind in Guandan: A Multi-Player Cooperative Game under Imperfect Information

Yangqiu Song

305

05 Aug 2024

Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist

Xiaowei Huang

Qiufeng Wang

Kaizhu Huang

ELM LRM

254

11 Jul 2024

From Data to Commonsense Reasoning: The Use of Large Language Models for Explainable AI

Stefanie Krause

Frieder Stolzenburg

ELM LRM

219

04 Jul 2024

MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data

212

26 Jun 2024

A Moonshot for AI Oracles in the Sciences

202

25 Jun 2024

Modulating Language Model Experiences through Frictions

Katherine M. Collins

Adrian Weller

211

24 Jun 2024

Évaluation des capacités de réponse de larges modèles de langage (LLM) pour des questions d'historiens

120

21 Jun 2024

Do Large Language Models Exhibit Cognitive Dissonance? Studying the Difference Between Revealed Beliefs and Stated Answers

Manuel Mondal

Ljiljana Dolamic

Gérôme Bovet

Philippe Cudré-Mauroux

Julien Audiffren

457

21 Jun 2024

Relational Learning in Pre-Trained Models: A Theory from Hypergraph Recovery Perspective

224

17 Jun 2024

Pre-trained Large Language Models Use Fourier Features to Compute Addition

264

05 Jun 2024

Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models

820

04 Jun 2024

Applying Fine-Tuned LLMs for Reducing Data Needs in Load Profile Analysis

219

02 Jun 2024

Evaluating Mathematical Reasoning of Large Language Models: A Focus on Error Identification and Correction

242

02 Jun 2024

Models That Prove Their Own Correctness

451

24 May 2024

Investigating Symbolic Capabilities of Large Language Models

167

21 May 2024

Can formal argumentative reasoning enhance LLMs performances?

138

16 May 2024

Exploring the Impact of ChatGPT on Wikipedia Engagement

196

16 May 2024

The AI Companion in Education: Analyzing the Pedagogical Potential of ChatGPT in Computer Science and Engineering

108

23 Apr 2024

NegotiationToM: A Benchmark for Stress-testing Machine Theory of Mind on Negotiation Surrounding

236

21 Apr 2024

Large Language Models as Test Case Generators: Performance Evaluation and Enhancement

Ke-Shen Li

Shijie Cao

LLMAG

181

20 Apr 2024