2
0

Integrative Analysis and Imputation of Multiple Data Streams via Deep Gaussian Processes

Abstract

Healthcare data, particularly in critical care settings, presents three key challenges for analysis. First, physiological measurements come from different sources but are inherently related. Yet, traditional methods often treat each measurement type independently, losing valuable information about their relationships. Second, clinical measurements are collected at irregular intervals, and these sampling times can carry clinical meaning. Finally, the prevalence of missing values. Whilst several imputation methods exist to tackle this common problem, they often fail to address the temporal nature of the data or provide estimates of uncertainty in their predictions. We propose using deep Gaussian process emulation with stochastic imputation, a methodology initially conceived to deal with computationally expensive models and uncertainty quantification, to solve the problem of handling missing values that naturally occur in critical care data. This method leverages longitudinal and cross-sectional information and provides uncertainty estimation for the imputed values. Our evaluation of a clinical dataset shows that the proposed method performs better than conventional methods, such as multiple imputations with chained equations (MICE), last-known value imputation, and individually fitted Gaussian Processes (GPs).

View on arXiv
@article{septiandri2025_2505.12076,
  title={ Integrative Analysis and Imputation of Multiple Data Streams via Deep Gaussian Processes },
  author={ Ali Akbar Septiandri and Deyu Ming and F. Alejandro DiazDelaO and Takoua Jendoubi and Samiran Ray },
  journal={arXiv preprint arXiv:2505.12076},
  year={ 2025 }
}
Comments on this paper