ECG-Expert-QA: A Benchmark for Evaluating Medical Large Language Models in Heart Disease Diagnosis

We present ECG-Expert-QA, a comprehensive multimodal dataset for evaluating diagnostic capabilities in electrocardiogram (ECG) interpretation. It combines real-world clinical ECG data with systematically generated synthetic cases, covering 12 essential diagnostic tasks and totaling 47,211 expert-validated QA pairs. These encompass diverse clinical scenarios, from basic rhythm recognition to complex diagnoses involving rare conditions and temporal changes. A key innovation is the support for multi-turn dialogues, enabling the development of conversational medical AI systems that emulate clinician-patient or interprofessional interactions. This allows for more realistic assessment of AI models' clinical reasoning, diagnostic accuracy, and knowledge integration. Constructed through a knowledge-guided framework with strict quality control, ECG-Expert-QA ensures linguistic and clinical consistency, making it a high-quality resource for advancing AI-assisted ECG interpretation. It challenges models with tasks like identifying subtle ischemic changes and interpreting complex arrhythmias in context-rich scenarios. To promote research transparency and collaboration, the dataset, accompanying code, and prompts are publicly released atthis https URL
View on arXiv@article{wang2025_2502.17475, title={ ECG-Expert-QA: A Benchmark for Evaluating Medical Large Language Models in Heart Disease Diagnosis }, author={ Xu Wang and Jiaju Kang and Puyu Han and Yubao Zhao and Qian Liu and Liwenfei He and Lingqiong Zhang and Lingyun Dai and Yongcheng Wang and Jie Tao }, journal={arXiv preprint arXiv:2502.17475}, year={ 2025 } }