Bootstrapping and Sample Splitting For High-Dimensional, Assumption-Free Inference

Several new methods have been proposed for performing valid inference after model selection. An older method is sampling splitting: use part of the data for model selection and part for inference. In this paper we revisit sample splitting combined with the bootstrap (or the Normal approximation). We show that this leads to a simple, assumption-free approach to inference and we establish results on the accuracy of the method. In fact, we find new bounds on the accuracy of the bootstrap and the Normal approximation for general nonlinear parameters with increasing dimension which we then use to assess the accuracy of regression inference. We show that an alternative, called the image bootstrap, has higher coverage accuracy at the cost of more computation. We define new parameters that measure variable importance and that can be inferred with greater accuracy than the usual regression coefficients. There is a inference-prediction tradeoff: splitting increases the accuracy and robustness of inference but can decrease the accuracy of the predictions.
View on arXiv