DiffRed: Dimensionality Reduction guided by stable rank

In this work, we propose a novel dimensionality reduction technique, DiffRed, which first projects the data matrix, A, along first principal components and the residual matrix (left after subtracting its -rank approximation) along Gaussian random vectors. We evaluate M1, the distortion of mean-squared pair-wise distance, and Stress, the normalized value of RMS of distortion of the pairwise distances. We rigorously prove that DiffRed achieves a general upper bound of on Stress and on M1 where is the fraction of variance explained by the first principal components and is the stable rank of . These bounds are tighter than the currently known results for Random maps. Our extensive experiments on a variety of real-world datasets demonstrate that DiffRed achieves near zero M1 and much lower values of Stress as compared to the well-known dimensionality reduction techniques. In particular, DiffRed can map a 6 million dimensional dataset to 10 dimensions with 54% lower Stress than PCA.
View on arXiv