The power of big data sparse signal detection tests on nonparametric detection boundaries

In the literature weak and sparse (or dense) signals within high dimensional data or Big Data are well studied concerning detection, feature selection and estimation of the number of signals. In this paper we focus on the quality of detection tests for signals. It is known for different (mainly) parametric models that the detection boundary of the log-likelihood ratio test and the higher criticism test of Tukey coincide asymptotically. In contrast to this it is less known about the behavior of tests on the detection boundary, especially for the higher criticism test. We fill this gap in great detail with the analysis on the detection boundary. For the log-likelihood ratio test we explain how to determine the detection boundary, nontrivial limits of its test statistics on this boundary and asymptotic efficiency in the spirit of Pitman. We also give general tools to handle the higher criticism statistics. Beside these general results, we discuss two specific models in more detail: the well known heteroscedastic normal mixture model and a nonparametric model for p-values given by tests for sparse signals. For these we verify that the higher criticism test has no asymptotic power on the detection boundary while the log-likelihood ratio test has nontrivial power there.
View on arXiv