Hypothesis Testing in High-Dimensional Regression under the Gaussian Random Design Model: Asymptotic Theory

IEEE Transactions on Information Theory (IEEE Trans. Inf. Theory), 2013

17 January 2013

Abstract

We consider linear regression in the high-dimensional regime in which the number of observations $n$ is smaller than the number of parameters $p$ . A very successful approach in this setting uses $\ell_1$ -penalized least squares (a.k.a. the Lasso) to search for a subset of $s_0< n$ parameters that best explain the data, while setting the other parameters to zero. A considerable amount of work has been devoted to characterizing the estimation and model selection problems within this approach. In this paper we consider instead the fundamental, but far less understood, question of statistical significance. We study this problem under the random design model in which the rows of the design matrix are i.i.d. and drawn from a high-dimensional Gaussian distribution. This situation arises, for instance, in learning high-dimensional Gaussian graphical models. Leveraging on an asymptotic distributional characterization of regularized least squares estimators, we develop a procedure for computing p-values and hence assessing statistical significance for hypothesis testing. We characterize the statistical power of this procedure, and evaluate it on synthetic and real data, comparing it with earlier proposals. Finally, we provide an upper bound on the minimax power of tests with a given significance level and show that our proposed procedure achieves this bound in case of design matrices with i.i.d. Gaussian entries.

View on arXiv

Comments on this paper