37
0

SuPreME: A Supervised Pre-training Framework for Multimodal ECG Representation Learning

Abstract

Cardiovascular diseases are a leading cause of death and disability worldwide. Electrocardiogram (ECG) recordings are critical for diagnosing and monitoring cardiac health, but obtaining large-scale annotated ECG datasets is labor-intensive and time-consuming. Recent ECG Self-Supervised Learning (eSSL) methods mitigate this by learning features without extensive labels but fail to capture fine-grained clinical semantics and require extensive task-specific fine-tuning. To address these challenges, we propose SuPreME\textbf{SuPreME}, a Su\textbf{Su}pervised Pre\textbf{Pre}-training framework for M\textbf{M}ultimodal E\textbf{E}CG representation learning. SuPreME applies Large Language Models (LLMs) to extract structured clinical entities from free-text ECG reports, filter out noise and irrelevant content, enhance clinical representation learning, and build a high-quality, fine-grained labeled dataset. By using text-based cardiac queries instead of traditional categorical labels, SuPreME enables zero-shot classification of unseen diseases without additional fine-tuning. We evaluate SuPreME on six downstream datasets covering 127 cardiac conditions, achieving superior zero-shot AUC performance over state-of-the-art eSSL and multimodal methods by over 1.96\%. Results demonstrate the effectiveness of SuPreME in leveraging structured, clinically relevant knowledge for high-quality ECG representations. All code and data will be released upon acceptance.

View on arXiv
@article{cai2025_2502.19668,
  title={ SuPreME: A Supervised Pre-training Framework for Multimodal ECG Representation Learning },
  author={ Mingsheng Cai and Jiuming Jiang and Wenhao Huang and Che Liu and Rossella Arcucci },
  journal={arXiv preprint arXiv:2502.19668},
  year={ 2025 }
}
Comments on this paper