ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.06434
28
8

Scalable subsampling: computation, aggregation and inference

13 December 2021
D. Politis
ArXiv (abs)PDFHTML
Abstract

Subsampling is a general statistical method developed in the 1990s aimed at estimating the sampling distribution of a statistic θ^n\hat \theta _nθ^n​ in order to conduct nonparametric inference such as the construction of confidence intervals and hypothesis tests. Subsampling has seen a resurgence in the Big Data era where the standard, full-resample size bootstrap can be infeasible to compute. Nevertheless, even choosing a single random subsample of size bbb can be computationally challenging with both bbb and the sample size nnn being very large. In the paper at hand, we show how a set of appropriately chosen, non-random subsamples can be used to conduct effective -- and computationally feasible -- distribution estimation via subsampling. Further, we show how the same set of subsamples can be used to yield a procedure for subsampling aggregation -- also known as subagging -- that is scalable with big data. Interestingly, the scalable subagging estimator can be tuned to have the same (or better) rate of convergence as compared to θ^n\hat \theta _nθ^n​. The paper is concluded by showing how to conduct inference, e.g., confidence intervals, based on the scalable subagging estimator instead of the original θ^n\hat \theta _nθ^n​.

View on arXiv
Comments on this paper