37
29

Testing linear hypotheses in high-dimensional regressions

Abstract

For a multivariate linear model, Wilk's likelihood ratio test (LRT) constitutes one of the cornerstone tools. However, the computation of its quantiles under the null or the alternative requires complex analytic approximations and more importantly, these distributional approximations are feasible only for moderate dimension of the dependent variable, say p20p\le 20. On the other hand, assuming that the data dimension pp as well as the number qq of regression variables are fixed while the sample size nn grows, several asymptotic approximations are proposed in the literature for Wilk's \bLa\bLa including the widely used chi-square approximation. In this paper, we consider necessary modifications to Wilk's test in a high-dimensional context, specifically assuming a high data dimension pp and a large sample size nn. Based on recent random matrix theory, the correction we propose to Wilk's test is asymptotically Gaussian under the null and simulations demonstrate that the corrected LRT has very satisfactory size and power, surely in the large pp and large nn context, but also for moderately large data dimensions like p=30p=30 or p=50p=50. As a byproduct, we give a reason explaining why the standard chi-square approximation fails for high-dimensional data. We also introduce a new procedure for the classical multiple sample significance test in MANOVA which is valid for high-dimensional data.

View on arXiv
Comments on this paper