167

Concentration Based Inference for High Dimensional (Generalized) Regression Models: New Phenomena in Hypothesis Testing

Abstract

We develop simple and non-asymptotically justified methods for hypothesis testing about the coefficients (θRp\theta^{*}\in\mathbb{R}^{p}) in the high dimensional (generalized) regression models where pp can exceed the sample size nn. We consider H0:h(θ)=0mH_{0}:\,h(\theta^{*})=\mathbf{0}_{m} against H1:h(θ)0mH_{1}:\,h(\theta^{*})\neq\mathbf{0}_{m}, where mm can be as large as pp and hh can be nonlinear in θ\theta^{*}. Our test statistics is based on the sample score vector evaluated at an estimate θ^α\hat{\theta}_{\alpha} that satisfies h(θ^α)=0mh(\hat{\theta}_{\alpha})=\mathbf{0}_{m}, where α\alpha is the prespecified Type I error. We provide nonasymptotic control on the Type I and Type II errors for the score test, as well as confidence regions. By exploiting the concentration phenomenon in Lipschitz functions, the key component reflecting the "dimension complexity" in our non-asymptotic thresholds uses a Monte-Carlo approximation to "mimic" the expectation that is concentrated around and automatically captures the dependencies between the coordinates. The novelty of our methods is that their validity does not rely on good behavior of θ^αθ2\left\Vert \hat{\theta}_{\alpha}-\theta^{*}\right\Vert _{2} or even n1/2X(θ^αθ)2n^{-1/2}\left\Vert X\left(\hat{\theta}_{\alpha}-\theta^{*}\right)\right\Vert _{2} nonasymptotically or asymptotically. Most interestingly, we discover phenomena that are opposite from the existing literature: (1) More restrictions (larger mm) in H0H_{0} make our procedures more powerful, (2) whether θ\theta^{*} is sparse or not, it is possible for our procedures to detect alternatives with probability at least 1Type II error1-\textrm{Type II error} when pnp\geq n and m>pnm>p-n, (3) the coverage probability of our procedures is not affected by how sparse θ\theta^{*} is. The proposed procedures are evaluated with simulation studies, where the empirical evidence supports our key insights.

View on arXiv
Comments on this paper