55
1

Large-scale adaptive multiple testing for sequential data controlling false discovery and nondiscovery rates

Abstract

In modern scientific experiments, we frequently encounter data that have large dimensions, and in some experiments, such high dimensional data arrive sequentially rather than full data being available all at a time. We develop multiple testing procedures with simultaneous control of false discovery and nondiscovery rates when mm-variate data vectors X1,X2,\mathbf{X}_1, \mathbf{X}_2, \dots are observed sequentially or in groups and each coordinate of these vectors leads to a hypothesis testing. Existing multiple testing methods for sequential data uses fixed stopping boundaries that do not depend on sample size, and hence, are quite conservative when the number of hypotheses mm is large. We propose sequential tests based on adaptive stopping boundaries that ensure shrinkage of the continue sampling region as the sample size increases. Under minimal assumptions on the data sequence, we first develop a test based on an oracle test statistic such that both false discovery rate (FDR) and false nondiscovery rate (FNR) are nearly equal to some prefixed levels with strong control. Under a two-group mixture model assumption, we propose a data-driven stopping and decision rule based on local false discovery rate statistic that mimics the oracle rule and guarantees simultaneous control of FDR and FNR asymptotically as mm tends to infinity. Both the oracle and the data-driven stopping times are shown to be finite (i.e., proper) with probability 1 for all finite mm and converge to a finite constant as mm grows to infinity. Further, we compare the data-driven test with the existing gap rule proposed in He and Bartroff (2021) and show that the ratio of the expected sample sizes of our method and the gap rule tends to zero as mm goes to infinity. Extensive analysis of simulated datasets as well as some real datasets illustrate the superiority of the proposed tests over some existing methods.

View on arXiv
Comments on this paper