De-biasing the Lasso: Optimal Sample Size for Gaussian Designs
Performing statistical inference in high-dimensional models is an outstanding challenge. A major source of difficulty is the absence of precise information on the distribution of high-dimensional regularized estimators. Here, we consider linear regression in the high-dimensional regime and the Lasso estimator. In this context, we would like to perform inference on a high-dimensional parameters vector . Important progress has been achieved in computing confidence intervals and p-values for single coordinates , . A key role in these new inferential methods is played by a certain de-biased (or de-sparsified) estimator that is constructed from the Lasso estimator. Earlier work establishes that, under suitable assumptions on the design matrix, the coordinates of are asymptotically Gaussian provided the true parameters vector is -sparse with . The condition is considerably stronger than the one required for consistent estimation, namely . Here we consider Gaussian designs with known or unknown population covariance. When the covariance is known, we prove that the de-biased estimator is asymptotically Gaussian under the nearly optimal condition . Note that \emph{earlier work was limited to even for perfectly known covariance.} The same conclusion holds if the population covariance is unknown but can be estimated sufficiently well, e.g. because its inverse is very sparse. For intermediate regimes, we describe the trade-off between sparsity in the coefficients , and sparsity in the inverse covariance of the design.
View on arXiv