75
42

Randomized incomplete UU-statistics in high dimensions

Abstract

This paper studies inference for the mean vector of a high-dimensional UU-statistic. In the era of Big Data, the dimension dd of the UU-statistic and the sample size nn of the observations tend to be both large, and the computation of the UU-statistic is prohibitively demanding. Data-dependent inferential procedures such as the empirical bootstrap for UU-statistics is even more computationally expensive. To overcome such computational bottleneck, incomplete UU-statistics obtained by sampling fewer terms of the UU-statistic are attractive alternatives. In this paper, we introduce randomized incomplete UU-statistics with sparse weights whose computational cost can be made independent of the order of the UU-statistic. We derive non-asymptotic Gaussian approximation error bounds for the randomized incomplete UU-statistics in high dimensions, namely in cases where the dimension dd is possibly much larger than the sample size nn, for both non-degenerate and degenerate kernels. In addition, we propose novel and generic bootstrap methods for the incomplete UU-statistics that are computationally much less-demanding than existing bootstrap methods, and establish finite sample validity of the proposed bootstrap methods. The proposed bootstrap methods are illustrated on the application to nonparametric testing for the pairwise independence of a high-dimensional random vector under weaker assumptions than those appearing in the literature.

View on arXiv
Comments on this paper