In this paper, we address two problems in unsupervised subspace learning: 1) how to automatically identify the feature dimension of the learned subspace, and 2) how to learn the underlying subspace in the presence of gross corruptions such as Gaussian noise. We show that these two problems are two sides of one coin, i.e. they can be solved by removing possible errors from training data . To achieve this, we propose a new method (called Principal Coefficients Embedding, PCE) that can simultaneously learn a clean data set and a linear representation (denoted by ) from . By embedding into a -dimensional space, PCE obtains a projection matrix that preserves some desirable properties of inputs, where is exactly the rank of . PCE has three advantages: 1) it can automatically determine the feature dimension even though data are sampled from a union of multiple linear subspaces; 2) it is robust to various noises and real disguises; 3) it has a closed-form solution and can be calculated very fast. Extensive experimental results show the superiority of PCE on a range of databases with respect to classification accuracy, robustness and efficiency.
View on arXiv