The Smooth-Lasso and other $\ell_1+\ell_2$ -penalized methods

25 March 2010

Abstract

We consider the linear regression problem in the high dimensional setting, i.e., the number $p$ of covariates can be much larger than the sample size $n$ . In such a situation one often assumes sparsity of the regression vector, i.e., that it contains many zero components. We propose a Lasso-type estimator $\hat{\beta}^{Quad}$ (where ' $Quad$ ' stands for quadratic), which is based on two penalty terms. The first one is the $\ell_1$ norm of the regression coefficients used to exploit the sparsity of the regression as done by the Lasso estimator, whereas the second is a quadratic penalty term introduced to capture some additional information on the setting of the problem. We detail two special cases: the Elastic-Net $\hat{\beta}^{EN}$ , introduced by Zou and Hastie, deals with sparse problems where correlations between variables may exist; and the S-Lasso $\hat{\beta}^{SL}$ , which responds to sparse problems where successive regression coefficients are known to vary slowly (in some situations, this can also be interpreted in terms of correlations between successive coefficients). From a theoretical point of view, we establish variable selection consistency results and show that $\hat{\beta}^{Quad}$ achieves a Sparsity Inequality, i.e., a bound in terms of the number of non-zero components of the `true' regression vector. These results are provided under a weaker assumption on the Gram matrix than the one used by the Lasso. In some (bad) situations this guarantees a significant improvement over the Lasso. Furthermore, a simulation study is conducted and shows that when we consider the estimation accuracy, the S-Lasso $\hat{\beta}^{SL}$ performs better than known methods as the Lasso, the Elastic-Net $\hat{\beta}^{EN}$ , and the Fused-Lasso (introduced by Tibshirani et al.), specifically when the regression vector is `smooth', i.e., when the variations between successive coefficients of the unknown parameter of the regression are small. The study also reveals that the theoretical calibration of the tuning parameters imply a S-Lasso solution with close performance to the S-Lasso when the tuning parameters are chosen by 10 fold cross validation.

View on arXiv

Comments on this paper

The Smooth-Lasso and other ℓ1+ℓ2\ell_1+\ell_2ℓ1​+ℓ2​-penalized methods

The Smooth-Lasso and other $\ell_1+\ell_2$ -penalized methods