Exploring the Feasibility of Multimodal Chatbot AI as Copilot in Pathology Diagnostics: Generalist Model's Pitfall

4 September 2024

Mianxin Liu

Shaoting Zhang

Zhe Wang

LM&MA

ArXiv (abs)PDF HTML

Main:15 Pages

7 Figures

Abstract

Pathology images are crucial for diagnosing and managing various diseases by visualizing cellular and tissue-level abnormalities. Recent advancements in artificial intelligence (AI), particularly multimodal models like ChatGPT, have shown promise in transforming medical image analysis through capabilities such as medical vision-language question answering. However, there remains a significant gap in integrating pathology image data with these AI models for clinical applications. This study benchmarks the performance of GPT on pathology images, assessing their diagnostic accuracy and efficiency in real-word clinical records. We observe significant deficits of GPT in bone diseases and a fair-level performance in diseases from other three systems. Despite offering satisfactory abnormality annotations, GPT exhibits consistent disadvantage in terminology accuracy and multimodal integration. Specifically, we demonstrate GPT's failures in interpreting immunohistochemistry results and diagnosing metastatic cancers. This study highlight the weakness of current generalist GPT model and contribute to the integration of pathology and advanced AI.

View on arXiv

Comments on this paper