Resampling Sensitivity of High-Dimensional PCA

The study of stability and sensitivity of statistical methods or algorithms with respect to their data is an important problem in machine learning and statistics. The performance of the algorithm under resampling of the data is a fundamental way to measure its stability and is closely related to generalization or privacy of the algorithm. In this paper, we study the resampling sensitivity for the principal component analysis (PCA). Given an random matrix , let be the matrix obtained from by resampling randomly chosen entries of . Let and denote the principal components of and . In the proportional growth regime , we establish the sharp threshold for the sensitivity/stability transition of PCA. When , the principal components and are asymptotically orthogonal. On the other hand, when , the principal components and are asymptotically colinear. In words, we show that PCA is sensitive to the input data in the sense that resampling even a negligible portion of the input may completely change the output.
View on arXiv