OpenECG: Benchmarking ECG Foundation Models with Public 1.2 Million Records

2 March 2025

Abstract

This study introduces OpenECG, a large-scale benchmark of 1.2 million 12-lead ECG recordings from nine centers, to evaluate ECG foundation models (ECG-FMs) trained on public datasets. We investigate three self-supervised learning methods (SimCLR, BYOL, MAE) with ResNet-50 and Vision Transformer architectures, assessing model generalization through leave-one-dataset-out experiments and data scaling analysis. Results show that pre-training on diverse datasets significantly improves generalization, with BYOL and MAE outperforming SimCLR, highlighting the efficacy of feature-consistency and generative learning over contrastive approaches. Data scaling experiments reveal that performance saturates at 60-70% of total data for BYOL and MAE, while SimCLR requires more data. These findings demonstrate that publicly available ECG data can match or surpass proprietary datasets in training robust ECG-FMs, paving the way for scalable, clinically meaningful AI-driven ECG analysis.

View on arXiv

@article{wan2025_2503.00711,
  title={ OpenECG: Benchmarking ECG Foundation Models with Public 1.2 Million Records },
  author={ Zhijiang Wan and Qianhao Yu and Jia Mao and Wenfeng Duan and Cheng Ding },
  journal={arXiv preprint arXiv:2503.00711},
  year={ 2025 }
}

Comments on this paper