393
v1v2v3 (latest)

Classification of high-dimensional data with spiked covariance matrix structure

Main:18 Pages
Bibliography:5 Pages
17 Tables
Appendix:15 Pages
Abstract

We study the classification problem for high-dimensional data with nn observations on pp features where the p×pp \times p covariance matrix Σ\Sigma exhibits a spiked eigenvalue structure and the vector ζ\zeta, given by the difference between the {\em whitened} mean vectors, is sparse. We analyze an adaptive classifier (adaptive with respect to the sparsity ss) that first performs dimension reduction on the feature vectors prior to classification in the dimensionally reduced space, i.e., the classifier whitens the data, then screens the features by keeping only those corresponding to the ss largest coordinates of ζ\zeta and finally applies Fisher linear discriminant on the selected features. Leveraging recent results on entrywise matrix perturbation bounds for covariance matrices, we show that the resulting classifier is Bayes optimal whenever nn \rightarrow \infty and sn1lnp0s \sqrt{n^{-1} \ln p} \rightarrow 0. Notably, our theory also guarantees Bayes optimality for the corresponding quadratic discriminant analysis (QDA). Experimental results on real and synthetic data further indicate that the proposed approach is competitive with state-of-the-art methods while operating on a substantially lower-dimensional representation.

View on arXiv
Comments on this paper