Equivalence of Kernel Machine Regression and Kernel Distance Covariance
for Multidimensional Trait Association Studies
Finding associations between genetic markers and disease traits in high dimensional samples is a challenging problem in statistics and science. Two classes of methods are used to examine the interactions between genetic markers and disease phenotypes, i.e., kernel-machine regression (KMR), and kernel distance covariance (KDC). In the field of statistics, KMR is a semiparametric regression model that modeled the covariate effects parametrically, while the genetic markers are considered non-parametrically. KDC is a term that we have defined as a class of methods that includes distance covariance (DC) and Hilbert-Schmidt Independence Criterion (HSIC), which is a non-parametric statistic popularly used in the machine learning community for the test of independence, given the particular kernels. In this work, we show that the score test of KMR is equivalent to the KDC statistic under certain kernel conditions. We also propose a novel KDC test that incorporates covariate effects and show that these two tests are the same in the presence of the covariates. Our contributions are three-fold: (1) establishing the equivalence between KMR and KDC; (2) the principles of kernel machine regression can be applied to the interpretation of KDC; (3) the KMR statistic is a member of a broader class of KDC statistics, that the members are the quantities of different kernels. We present the theoretical representation of the KMR and KDC equivalence, while the empirical justifications are demonstrated in our simulation studies for both single and multi-traits. Finally, the ADNI study is used to explore the association between the genetic variants on gene FLJ16124 and phenotypes represented in 3D structural brain MR images adjusting for age and gender. The results suggest that SNPs of FLJ16124 exhibit strong pairwise interaction effects that are correlated to the changes of brain region volumes.
View on arXiv