ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.00631
34
4

On the Subbagging Estimation for Massive Data

28 February 2021
Tao Zou
Xian Li
Xuan Liang
Hansheng Wang
ArXiv (abs)PDFHTML
Abstract

This article introduces subbagging (subsample aggregating) estimation approaches for big data analysis with memory constraints of computers. Specifically, for the whole dataset with size NNN, mNm_NmN​ subsamples are randomly drawn, and each subsample with a subsample size kN≪Nk_N\ll NkN​≪N to meet the memory constraint is sampled uniformly without replacement. Aggregating the estimators of mNm_NmN​ subsamples can lead to subbagging estimation. To analyze the theoretical properties of the subbagging estimator, we adapt the incomplete UUU-statistics theory with an infinite order kernel to allow overlapping drawn subsamples in the sampling procedure. Utilizing this novel theoretical framework, we demonstrate that via a proper hyperparameter selection of kNk_NkN​ and mNm_NmN​, the subbagging estimator can achieve N\sqrt{N}N​-consistency and asymptotic normality under the condition (kNmN)/N→α∈(0,∞](k_Nm_N)/N\to \alpha \in (0,\infty](kN​mN​)/N→α∈(0,∞]. Compared to the full sample estimator, we theoretically show that the N\sqrt{N}N​-consistent subbagging estimator has an inflation rate of 1/α1/\alpha1/α in its asymptotic variance. Simulation experiments are presented to demonstrate the finite sample performances. An American airline dataset is analyzed to illustrate that the subbagging estimate is numerically close to the full sample estimate, and can be computationally fast under the memory constraint.

View on arXiv
Comments on this paper