76
49

Thresholded Lasso for high dimensional variable selection and statistical estimation

Abstract

Given nn noisy samples with pp dimensions, where npn \ll p, we show that the multi-step thresholding procedure based on the Lasso -- we call it the {\it Thresholded Lasso}, can accurately estimate a sparse vector βRp\beta \in \R^p in a linear model Y=Xβ+ϵY = X \beta + \epsilon, where Xn×pX_{n \times p} is a design matrix normalized to have column 2\ell_2 norm n\sqrt{n}, and ϵN(0,σ2In)\epsilon \sim N(0, \sigma^2 I_n). We show that under the restricted eigenvalue (RE) condition (Bickel-Ritov-Tsybakov 09), it is possible to achieve the 2\ell_2 loss within a logarithmic factor of the ideal mean square error one would achieve with an {\em oracle} while selecting a sufficiently sparse model -- hence achieving {\it sparse oracle inequalities}; the oracle would supply perfect information about which coordinates are non-zero and which are above the noise level. In some sense, the Thresholded Lasso recovers the choices that would have been made by the 0\ell_0 penalized least squares estimators, in that it selects a sufficiently sparse model without sacrificing the accuracy in estimating β\beta and in predicting XβX \beta. We also show for the Gauss-Dantzig selector (Cand\`{e}s-Tao 07), if XX obeys a uniform uncertainty principle and if the true parameter is sufficiently sparse, one will achieve the sparse oracle inequalities as above, while allowing at most s0s_0 irrelevant variables in the model in the worst case, where s0ss_0 \leq s is the smallest integer such that for λ=2logp/n\lambda = \sqrt{2 \log p/n}, i=1pmin(βi2,λ2σ2)s0λ2σ2\sum_{i=1}^p \min(\beta_i^2, \lambda^2 \sigma^2) \leq s_0 \lambda^2 \sigma^2. Our simulation results on the Thresholded Lasso match our theoretical analysis excellently.

View on arXiv
Comments on this paper