v1v2 (latest)

Mathematical Capabilities of ChatGPT

Neural Information Processing Systems (NeurIPS), 2023

31 January 2023

Papers citing "Mathematical Capabilities of ChatGPT"

50 / 227 papers shown

Can LLMs Understand Computer Networks? Towards a Virtual System Administrator

296

19 Apr 2024

A Survey on Deep Learning for Theorem Proving

290

15 Apr 2024

Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward

385

12 Apr 2024

Capabilities of Large Language Models in Control Engineering: A Benchmark Study on GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra

Geir Dullerud

210

04 Apr 2024

From Large to Tiny: Distilling and Refining Mathematical Expertise for Math Word Problems with Weakly Supervision

313

21 Mar 2024

Review of Generative AI Methods in Cybersecurity

458

13 Mar 2024

Human I/O: Towards a Unified Approach to Detecting Situational Impairments

234

06 Mar 2024

Chaining thoughts and LLMs to learn DNA structural biophysics

Tyler D. Ross

Ashwin Gopinath

AI4CE

132

02 Mar 2024

Large Language Models and Games: A Survey and Roadmap

Georgios N. Yannakakis

LLMAG LM&MA AI4CE LRM

489

135

28 Feb 2024

A New Era in LLM Security: Exploring Security Concerns in Real-World LLM-based Systems

258

101

28 Feb 2024

WIPI: A New Web Threat for LLM-Driven Web Agents

Yulong Cao

242

26 Feb 2024

How Large Language Models Encode Context Knowledge? A Layer-Wise Probing Study

234

25 Feb 2024

OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems

...

Yuxiang Zhang

Jie Liu

Lei Qi

Zhiyuan Liu

Maosong Sun

ELM AIMat

408

690

21 Feb 2024

FormulaReasoning: A Dataset for Formula-Based Numerical Reasoning

611

20 Feb 2024

Language Models as Science Tutors

Alexander Wettig

...

251

16 Feb 2024

UrbanKGent: A Unified Large Language Model Agent Framework for Urban Knowledge Graph ConstructionNeural Information Processing Systems (NeurIPS), 2024

Yansong Ning

Hao Liu

LLMAG

262

10 Feb 2024

Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closed-Source LLMsConference of the European Chapter of the Association for Computational Linguistics (EACL), 2024

452

259

06 Feb 2024

Large Language Models for Mathematical Reasoning: Progresses and Challenges

363

271

31 Jan 2024

ChatGPT in the classroom. Exploring its potential and limitations in a Functional Programming courseInternational journal of human computer interactions (IJHCI), 2023

Dan-Matei Popovici

158

20 Jan 2024

Code Simulation Challenges for Large Language Models

311

17 Jan 2024

Stability Analysis of ChatGPT-based Sentiment Analysis in AI Quality Assurance

Tinghui Ouyang

AprilPyone Maungmaung

Koichi Konishi

Yoshiki Seo

Isao Echizen

AI4MH

198

15 Jan 2024

Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning

Jianbo Yuan

Hongxia Yang

316

146

10 Jan 2024

AI Hallucinations: A Misnomer Worth ClarifyingConference on Algebraic Informatics (CAI), 2024

Negar Maleki

Balaji Padmanabhan

Kaushik Dutta

447

102

09 Jan 2024

Computational Argumentation-based Chatbots: a Survey

317

07 Jan 2024

Self-Contrast: Better Reflection Through Inconsistent Solving PerspectivesAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

518

04 Jan 2024

NPHardEval: Dynamic Benchmark on Reasoning Ability of Large Language Models via Complexity Classes

Lizhou Fan

359

22 Dec 2023

Assessing the Impact of Prompting Methods on ChatGPT's Mathematical Capabilities

...

342

22 Dec 2023

Evaluating AI Vocational Skills Through Professional Testing

David Noever

Matt Ciolino

ELM

130

17 Dec 2023

Exploring Large Language Models in Resolving Environment-Related Crash Bugs: Localizing and Repairing

165

16 Dec 2023

Early ChatGPT User Portrait through the Lens of Data

Yuyang Deng

Ni Zhao

Xin Huang

142

10 Dec 2023

Exploring the Limits of ChatGPT in Software Security Applications

229

08 Dec 2023

DeceptPrompt: Exploiting LLM-driven Code Generation via Adversarial Natural Language Instructions

Fangzhou Wu

Xiaogeng Liu

Chaowei Xiao

AAML SILM

311

07 Dec 2023

Large Language Models for Mathematicians

221

07 Dec 2023

InteraSSort: Interactive Assortment Planning Using Large Language ModelsSocial Science Research Network (SSRN), 2023

Saketh Reddy Karra

Theja Tulabandhula

173

20 Nov 2023

Exploring the Potential of Large Language Models in Computational ArgumentationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

285

15 Nov 2023

When does In-context Learning Fall Short and Why? A Study on Specification-Heavy Tasks

Hao Peng

Xiaozhi Wang

...

Bin Xu

Lei Hou

Juanzi Li

261

15 Nov 2023

Adversarial Preference Optimization: Enhancing Your Alignment via RM-LLM GameAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

713

14 Nov 2023

Everything of Thoughts: Defying the Law of Penrose Triangle for Thought GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Wei Zhang

382

07 Nov 2023

An Interdisciplinary Outlook on Large Language Models for Scientific Research

...

Anastasia Visheratina

Xin Xie

243

03 Nov 2023

The Expressibility of Polynomial based Attention Scheme

Zhao Song

Guangyi Xu

Junze Yin

323

30 Oct 2023

The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics

331

30 Oct 2023

Enhancing Chemistry Learning with ChatGPT, Bing Chat, Bard, and Claude as Agents-to-Think-With: A Comparative Case StudySocial Science Research Network (SSRN), 2023

Renato P. dos Santos

123

23 Oct 2023

LUNA: A Model-Based Universal Analysis Framework for Large Language ModelsIEEE Transactions on Software Engineering (TSE), 2023

351

22 Oct 2023

AI for Mathematics: A Cognitive Science Perspective

Cedegao E. Zhang

Katherine M. Collins

Adrian Weller

Joshua B. Tenenbaum

208

19 Oct 2023

Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations

382

109

17 Oct 2023

Large Language Models Meet Open-World Intent Discovery and Recognition: An Evaluation of ChatGPTConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Weiran Xu

223

16 Oct 2023

GLoRE: Evaluating Logical Reasoning of Large Language Models

Yue Zhang

386

13 Oct 2023

Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA Exams

238

12 Oct 2023

A New Benchmark and Reverse Validation Method for Passage-level Hallucination DetectionConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

300

10 Oct 2023

OptiMUS: Optimization Modeling Using MIP Solvers and large language models

Ali AhmadiTeshnizi

Wenzhi Gao

Madeleine Udell

LLMAG

143

09 Oct 2023