14
1

Domain-invariant Clinical Representation Learning by Bridging Data Distribution Shift across EMR Datasets

Abstract

Emerging diseases present challenges in symptom recognition and timely clinical intervention due to limited available information. An effective prognostic model could assist physicians in making accurate diagnoses and designing personalized treatment plans to prevent adverse outcomes. However, in the early stages of disease emergence, several factors hamper model development: limited data collection, insufficient clinical experience, and privacy and ethical concerns restrict data availability and complicate accurate label assignment. Furthermore, Electronic Medical Record (EMR) data from different diseases or sources often exhibit significant cross-dataset feature misalignment, severely impacting the effectiveness of deep learning models. We present a domain-invariant representation learning method that constructs a transition model between source and target datasets. By constraining the distribution shift of features generated across different domains, we capture domain-invariant features specifically relevant to downstream tasks, developing a unified domain-invariant encoder that achieves better feature representation across various task domains. Experimental results across multiple target tasks demonstrate that our proposed model surpasses competing baseline methods and achieves faster training convergence, particularly when working with limited data. Extensive experiments validate our method's effectiveness in providing more accurate predictions for emerging pandemics and other diseases. Code is publicly available atthis https URL.

View on arXiv
@article{zhang2025_2310.07799,
  title={ Domain-invariant Clinical Representation Learning by Bridging Data Distribution Shift across EMR Datasets },
  author={ Zhongji Zhang and Yuhang Wang and Yinghao Zhu and Xinyu Ma and Yasha Wang and Junyi Gao and Liantao Ma and Wen Tang and Xiaoyun Zhang and Ling Wang },
  journal={arXiv preprint arXiv:2310.07799},
  year={ 2025 }
}
Comments on this paper