Bootstrap in High Dimension with Low Computation

20 October 2022

Abstract

The bootstrap is a popular data-driven method to quantify statistical uncertainty, but for modern high-dimensional problems, it could suffer from huge computational costs due to the need to repeatedly generate resamples and refit models. We study the use of bootstraps in high-dimensional environments with a small number of resamples. In particular, we show that by using sample-resample independence from a recent "cheap" bootstrap perspective, running a number of resamples as small as one could attain valid coverage even when the dimension grows closely with the sample size, thus supporting the implementability of the bootstrap for large-scale problems. We validate our theoretical results and compare the performance of our approach with other benchmarks via a range of experiments.

View on arXiv

Comments on this paper