20
1

CDPA: Common and Distinctive Pattern Analysis between High-dimensional Datasets

Abstract

A representative model in integrative analysis of two high-dimensional correlated datasets is to decompose each data matrix into a low-rank common matrix generated by latent factors shared across datasets, a low-rank distinctive matrix corresponding to each dataset, and an additive noise matrix. Existing decomposition methods claim that their common matrices capture the common pattern of the two datasets. However, their so-called common pattern only denotes the common latent factors but ignores the common information between the two coefficient matrices of these latent factors. We propose a novel method, called the common and distinctive pattern analysis (CDPA), which appropriately defines the two patterns by further incorporating the common and distinctive information of the coefficient matrices. A consistent estimation approach is developed for high-dimensional settings, and shows reasonably good finite-sample performance in simulations. The superiority of CDPA over state-of-the-art methods is corroborated in both simulated data and two real-data examples from the Human Connectome Project and The Cancer Genome Atlas. A Python package implementing the CDPA method is available at https://github.com/shu-hai/CDPA.

View on arXiv
Comments on this paper