18
2

DiffRed: Dimensionality Reduction guided by stable rank

Abstract

In this work, we propose a novel dimensionality reduction technique, DiffRed, which first projects the data matrix, A, along first k1k_1 principal components and the residual matrix AA^{*} (left after subtracting its k1k_1-rank approximation) along k2k_2 Gaussian random vectors. We evaluate M1, the distortion of mean-squared pair-wise distance, and Stress, the normalized value of RMS of distortion of the pairwise distances. We rigorously prove that DiffRed achieves a general upper bound of O(1pk2)O\left(\sqrt{\frac{1-p}{k_2}}\right) on Stress and O((1p)k2ρ(A))O\left(\frac{(1-p)}{\sqrt{k_2*\rho(A^{*})}}\right) on M1 where pp is the fraction of variance explained by the first k1k_1 principal components and ρ(A)\rho(A^{*}) is the stable rank of AA^{*}. These bounds are tighter than the currently known results for Random maps. Our extensive experiments on a variety of real-world datasets demonstrate that DiffRed achieves near zero M1 and much lower values of Stress as compared to the well-known dimensionality reduction techniques. In particular, DiffRed can map a 6 million dimensional dataset to 10 dimensions with 54% lower Stress than PCA.

View on arXiv
Comments on this paper