Support recovery via weighted maximum-contrast subagging

14 June 2013

Abstract

In this paper, we consider subagging for non-smooth estimation and model selection in sparse linear regression settings. Proposed weighted maximum-contrast subagging scales with datasets of arbitrary size, yet manages to achieve excellent support recovery. This makes it particularly relevant for computation over complete datasets of extremely large scale, where using traditional methods might be impractical. We develop theory in support of the claim that proposed method has tight error control over both false positives and false negatives, regardless of the size of dataset. Unlike existing methods, it allows for oracle-like properties, even in cases of non-oracle-like properties of aggregated estimators. Moreover, we show limitations of traditional subagging in cases where subsamples are of much smaller order relative to the size of original data. In such situations, it results in discontinuous estimated support set and never recovers sparsity set when at least one of aggregated estimators has probability of support recovery strictly less than 1. Furthermore, we design an adaptive procedure for selecting tuning parameters and optimal weighting scheme. It simultaneously alleviates overall computational burden and relaxes eigenvalue conditions on the design matrix. Finally, we validate our theoretical findings through simulation study and analysis of a part of million-song-challenge dataset.

View on arXiv

Comments on this paper