471

On Model Identification and Out-of-Sample Prediction of Principal Component Regression: Applications to Synthetic Controls

Abstract

We analyze principal component regression (PCR) in a high-dimensional error-in-variables setting with fixed design. Under suitable conditions, we show that PCR consistently identifies the unique model with minimum 2\ell_2-norm and is near minimax optimal. These results enable us to establish non-asymptotic out-of-sample prediction guarantees that improve upon the best known rates. In our analysis, we introduce a natural linear algebraic condition between the in- and out-of-sample covariates, which allows us to avoid distributional assumptions. Our simulations illustrate the importance of this condition for generalization, even under covariate shifts. As a byproduct, our results also lead to novel results for the synthetic controls literature, a leading approach for policy evaluation. In particular, our minimax results suggest the attractiveness of PCR based methods amongst the numerous variants. To the best of our knowledge, our prediction guarantees for the fixed design setting have been elusive in both the high-dimensional error-in-variables and synthetic controls literatures.

View on arXiv
Comments on this paper