SLOPE is Adaptive to Unknown Sparsity and Asymptotically Minimax

29 March 2015

Weijie Su

Emmanuel Candes

Abstract

We consider high-dimensional sparse regression problems in which we observe $y = X \beta + z$ , where $X$ is an $n \times p$ design matrix and $z$ is an $n$ -dimensional vector of independent Gaussian errors, each with variance $\sigma^2$ . Our focus is on the recently introduced SLOPE estimator ((Bogdan et al., 2014)), which regularizes the least-squares estimates with the rank-dependent penalty $\sum_{1 \le i \le p} \lambda_i |\hat \beta|_{(i)}$ , where $|\hat \beta|_{(i)}$ is the $i$ th largest magnitude of the fitted coefficients. Under Gaussian designs, where the entries of $X$ are i.i.d.~ $\mathcal{N}(0, 1/n)$ , we show that SLOPE, with weights $\lambda_i$ just about equal to $\sigma \cdot \Phi^{-1}(1-iq/(2p))$ ( $\Phi^{-1}(\alpha)$ is the $\alpha$ th quantile of a standard normal and $q$ is a fixed number in $(0,1)$ ) achieves a squared error of estimation obeying \[ \sup_{\| \beta\|_0 \le k} \,\, \mathbb{P} \left(\| \hat{\beta}_{\text{SLOPE}} - \beta \|^2 > (1+\epsilon) \, 2\sigma^2 k \log(p/k) \right) \longrightarrow 0 \] as the dimension $p$ increases to $\infty$ , and where $\epsilon > 0$ is an arbitrary small constant. This holds under a weak assumption on the $\ell_0$ -sparsity level, namely, $k/p \rightarrow 0$ and $(k\log p)/n \rightarrow 0$ , and is sharp in the sense that this is the best possible error any estimator can achieve. A remarkable feature is that SLOPE does not require any knowledge of the degree of sparsity, and yet automatically adapts to yield optimal total squared errors over a wide range of $\ell_0$ -sparsity classes. We are not aware of any other estimator with this property.

View on arXiv

Comments on this paper