Estimating The Proportion of Signal Variables Under Arbitrary Covariance
Dependence
Accurately estimating the proportion of signals hidden in a large amount of noise variables is of interest in many scientific inquires. In this paper, we consider realistic but theoretically challenging settings with arbitrary covariance dependence between variables. We define mean absolute correlation (MAC) to measure the overall dependence strength and investigate a family of estimators for their performances in the full range of MAC. We explicit the joint effect of MAC and signal sparsity on the performances of the family of estimators and discover that the most powerful estimator under independence is no longer most effective when the MAC dependence is strong enough. Motivated by the theoretical insight, we propose a new estimator to better adapt to arbitrary covariance dependence. The proposed method compares favorably to several existing methods in extensive finite-sample settings with strong to weak covariance dependence and real dependence structures from genetic association studies.
View on arXiv