31
0

Optimal Estimation of Simultaneous Signals Using Absolute Inner Product with Applications to Integrative Genomics

Abstract

Integrating the summary statistics from genome-wide association study (GWAS) and expression quantitative trait loci (eQTL) data provides a powerful way of identifying the genes whose expression levels are potentially associated with complex diseases. A parameter called TT-score that quantifies the genetic relatedness of genes to disease phenotype based on the summary statistics is introduced based on the mean values of two Gaussian sequences. Specifically, given two independent samples xnN(θ,Σ1){\bf x}_n\sim N(\theta, \Sigma_1) and ynN(μ,Σ2){\bf y}_n\sim N(\mu, \Sigma_2), where Σ1\Sigma_1 and Σ2\Sigma_2 have unit diagonals, TT-score is defined as i=1nθiμi\sum_{i=1}^n |\theta_i|\cdot |\mu_i|, a non-smooth functional, which characterizes the degree of shared signals between two absolute normal mean vectors θ|\theta| and μ|\mu|. Using approximation theory, an estimator is constructed and shown to be minimax rate-optimal over various parameter spaces. Simulation studies demonstrate that the proposed estimator outperforms existing methods. The procedure is applied to an integrative analysis of heart failure genomics datasets and we identify several genes and biological pathways that are potentially causal to human heart failure.

View on arXiv
Comments on this paper