55
0

ECG-Expert-QA: A Benchmark for Evaluating Medical Large Language Models in Heart Disease Diagnosis

Xu Wang
Jiaju Kang
Puyu Han
Yubao Zhao
Qian Liu
Liwenfei He
Lingqiong Zhang
Lingyun Dai
Yongcheng Wang
Jie Tao
Abstract

We present ECG-Expert-QA, a comprehensive multimodal dataset for evaluating diagnostic capabilities in electrocardiogram (ECG) interpretation. It combines real-world clinical ECG data with systematically generated synthetic cases, covering 12 essential diagnostic tasks and totaling 47,211 expert-validated QA pairs. These encompass diverse clinical scenarios, from basic rhythm recognition to complex diagnoses involving rare conditions and temporal changes. A key innovation is the support for multi-turn dialogues, enabling the development of conversational medical AI systems that emulate clinician-patient or interprofessional interactions. This allows for more realistic assessment of AI models' clinical reasoning, diagnostic accuracy, and knowledge integration. Constructed through a knowledge-guided framework with strict quality control, ECG-Expert-QA ensures linguistic and clinical consistency, making it a high-quality resource for advancing AI-assisted ECG interpretation. It challenges models with tasks like identifying subtle ischemic changes and interpreting complex arrhythmias in context-rich scenarios. To promote research transparency and collaboration, the dataset, accompanying code, and prompts are publicly released atthis https URL

View on arXiv
@article{wang2025_2502.17475,
  title={ ECG-Expert-QA: A Benchmark for Evaluating Medical Large Language Models in Heart Disease Diagnosis },
  author={ Xu Wang and Jiaju Kang and Puyu Han and Yubao Zhao and Qian Liu and Liwenfei He and Lingqiong Zhang and Lingyun Dai and Yongcheng Wang and Jie Tao },
  journal={arXiv preprint arXiv:2502.17475},
  year={ 2025 }
}
Comments on this paper