37
144

SLOPE is Adaptive to Unknown Sparsity and Asymptotically Minimax

Weijie Su
Emmanuel Candes
Abstract

We consider high-dimensional sparse regression problems in which we observe y=Xβ+zy = X \beta + z, where XX is an n×pn \times p design matrix and zz is an nn-dimensional vector of independent Gaussian errors, each with variance σ2\sigma^2. Our focus is on the recently introduced SLOPE estimator ((Bogdan et al., 2014)), which regularizes the least-squares estimates with the rank-dependent penalty 1ipλiβ^(i)\sum_{1 \le i \le p} \lambda_i |\hat \beta|_{(i)}, where β^(i)|\hat \beta|_{(i)} is the iith largest magnitude of the fitted coefficients. Under Gaussian designs, where the entries of XX are i.i.d.~N(0,1/n)\mathcal{N}(0, 1/n), we show that SLOPE, with weights λi\lambda_i just about equal to σΦ1(1iq/(2p))\sigma \cdot \Phi^{-1}(1-iq/(2p)) (Φ1(α)\Phi^{-1}(\alpha) is the α\alphath quantile of a standard normal and qq is a fixed number in (0,1)(0,1)) achieves a squared error of estimation obeying \[ \sup_{\| \beta\|_0 \le k} \,\, \mathbb{P} \left(\| \hat{\beta}_{\text{SLOPE}} - \beta \|^2 > (1+\epsilon) \, 2\sigma^2 k \log(p/k) \right) \longrightarrow 0 \] as the dimension pp increases to \infty, and where ϵ>0\epsilon > 0 is an arbitrary small constant. This holds under a weak assumption on the 0\ell_0-sparsity level, namely, k/p0k/p \rightarrow 0 and (klogp)/n0(k\log p)/n \rightarrow 0, and is sharp in the sense that this is the best possible error any estimator can achieve. A remarkable feature is that SLOPE does not require any knowledge of the degree of sparsity, and yet automatically adapts to yield optimal total squared errors over a wide range of 0\ell_0-sparsity classes. We are not aware of any other estimator with this property.

View on arXiv
Comments on this paper