24
1

Statistical inference with F-statistics when fitting simple models to high-dimensional data

Abstract

We study linear subset regression in the context of the high-dimensional overall model y=ϑ+θz+ϵy = \vartheta+\theta' z + \epsilon with univariate response yy and a dd-vector of random regressors zz, independent of ϵ\epsilon. Here, "high-dimensional" means that the number dd of available explanatory variables is much larger than the number nn of observations. We consider simple linear sub-models where yy is regressed on a set of pp regressors given by x=Mzx = M'z, for some d×pd \times p matrix MM of full rank p<np < n. The corresponding simple model, i.e., y=α+βx+ey=\alpha+\beta' x + e, can be justified by imposing appropriate restrictions on the unknown parameter θ\theta in the overall model; otherwise, this simple model can be grossly misspecified. In this paper, we establish asymptotic validity of the standard FF-test on the surrogate parameter β\beta, in an appropriate sense, even when the simple model is misspecified.

View on arXiv
Comments on this paper