Optimal Estimation of Simultaneous Signals Using Absolute Inner Product with Applications to Integrative Genomics

Integrating the summary statistics from genome-wide association study (GWAS) and expression quantitative trait loci (eQTL) data provides a powerful way of identifying the genes whose expression levels are potentially associated with complex diseases. A parameter called -score that quantifies the genetic relatedness of genes to disease phenotype based on the summary statistics is introduced based on the mean values of two Gaussian sequences. Specifically, given two independent samples and , where and have unit diagonals, -score is defined as , a non-smooth functional, which characterizes the degree of shared signals between two absolute normal mean vectors and . Using approximation theory, an estimator is constructed and shown to be minimax rate-optimal over various parameter spaces. Simulation studies demonstrate that the proposed estimator outperforms existing methods. The procedure is applied to an integrative analysis of heart failure genomics datasets and we identify several genes and biological pathways that are potentially causal to human heart failure.
View on arXiv