On Principal Component Regression in a High-Dimensional
Error-in-Variables Setting

We analyze the classical method of principal component regression (PCR) in a high-dimensional error-in-variables setting. Here, the observed covariates are not only noisy and contain missing values, but the number of covariates can also exceed the sample size. Under suitable conditions, we establish that PCR identifies the unique linear model parameter with minimum -norm, and derive non-asymptotic -rates of convergence that show its consistency. Furthermore, we develop an algorithm for out-of-sample predictions in the presence of corrupted data that uses PCR as a key subroutine, and provide its non-asymptotic prediction performance guarantees. Notably, our results do not require the out-of-samples covariates to follow the same distribution as that of the in-sample covariates, but rather that they obey a simple linear algebraic constraint. We provide simulations that illustrate our theoretical results.
View on arXiv