Concentration Based Inference in High Dimensional Generalized Regression
Models (I: Statistical Guarantees)
We develop simple and non-asymptotically justified methods for hypothesis testing about the coefficients () in the high dimensional generalized regression models where can exceed the sample size. Given a function , we consider against , where can be any integer in and can be nonlinear in . Our test statistics is based on the sample "quasi score" vector evaluated at an estimate that satisfies , where is the prespecified Type I error. By exploiting the concentration phenomenon in Lipschitz functions, the key component reflecting the dimension complexity in our non-asymptotic thresholds uses a Monte-Carlo approximation to mimic the expectation that is concentrated around and automatically captures the dependencies between the coordinates. We provide probabilistic guarantees in terms of the Type I and Type II errors for the quasi score test. Confidence regions are also constructed for the population quasi-score vector evaluated at . The first set of our results are specific to the standard Gaussian linear regression models; the second set allow for reasonably flexible forms of non-Gaussian responses, heteroscedastic noise, and nonlinearity in the regression coefficients, while only requiring the correct specification of s. The novelty of our methods is that their validity does not rely on good behavior of (or even in the linear regression case) nonasymptotically or asymptotically.
View on arXiv