179

Efficient L1-Norm Principal-Component Analysis via Bit Flipping

IEEE Transactions on Signal Processing (IEEE TSP), 2016
Abstract

It was shown recently that the KK L1-norm principal components (L1-PCs) of a real-valued data matrix XRD×N\mathbf X \in \mathbb R^{D \times N} (NN data samples of DD dimensions) can be exactly calculated with cost O(2NK)\mathcal{O}(2^{NK}) or, when advantageous, O(NdKK+1)\mathcal{O}(N^{dK - K + 1}) where d=rank(X)d=\mathrm{rank}(\mathbf X), K<dK<d [1],[2]. In applications where X\mathbf X is large (e.g., "big" data of large NN and/or "heavy" data of large dd), these costs are prohibitive. In this work, we present a novel suboptimal algorithm for the calculation of the K<dK < d L1-PCs of X\mathbf X of cost O(NDmin{N,D}+N2(K4+dK2)+dNK3)\mathcal O(ND \mathrm{min} \{ N,D\} + N^2(K^4 + dK^2) + dNK^3), which is comparable to that of standard (L2-norm) PC analysis. Our theoretical and experimental studies show that the proposed algorithm calculates the exact optimal L1-PCs with high frequency and achieves higher value in the L1-PC optimization metric than any known alternative algorithm of comparable computational cost. The superiority of the calculated L1-PCs over standard L2-PCs (singular vectors) in characterizing potentially faulty data/measurements is demonstrated with experiments on data dimensionality reduction and disease diagnosis from genomic data.

View on arXiv
Comments on this paper