16
3

Sparse Dimensionality Reduction Revisited

Abstract

The sparse Johnson-Lindenstrauss transform is one of the central techniques in dimensionality reduction. It supports embedding a set of nn points in Rd\mathbb{R}^d into m=O(ε2lgn)m=O(\varepsilon^{-2} \lg n) dimensions while preserving all pairwise distances to within 1±ε1 \pm \varepsilon. Each input point xx is embedded to AxAx, where AA is an m×dm \times d matrix having ss non-zeros per column, allowing for an embedding time of O(sx0)O(s \|x\|_0). Since the sparsity of AA governs the embedding time, much work has gone into improving the sparsity ss. The current state-of-the-art by Kane and Nelson (JACM'14) shows that s=O(ε1lgn)s = O(\varepsilon ^{-1} \lg n) suffices. This is almost matched by a lower bound of s=Ω(ε1lgn/lg(1/ε))s = \Omega(\varepsilon ^{-1} \lg n/\lg(1/\varepsilon)) by Nelson and Nguyen (STOC'13). Previous work thus suggests that we have near-optimal embeddings. In this work, we revisit sparse embeddings and identify a loophole in the lower bound. Concretely, it requires dnd \geq n, which in many applications is unrealistic. We exploit this loophole to give a sparser embedding when d=o(n)d = o(n), achieving s=O(ε1(lgn/lg(1/ε)+lg2/3nlg1/3d))s = O(\varepsilon^{-1}(\lg n/\lg(1/\varepsilon)+\lg^{2/3}n \lg^{1/3} d)). We also complement our analysis by strengthening the lower bound of Nelson and Nguyen to hold also when dnd \ll n, thereby matching the first term in our new sparsity upper bound. Finally, we also improve the sparsity of the best oblivious subspace embeddings for optimal embedding dimensionality.

View on arXiv
Comments on this paper