173

RACE: Sub-Linear Memory Sketches for Approximate Near-Neighbor Search on Streaming Data

Anshumali Shrivastava
Richard G. Baraniuk
Abstract

We present the first sublinear memory sketch which can be queried to find the vv nearest neighbors in a dataset. Our online sketching algorithm can compress an NN-element dataset to a sketch of size O(Nblog3N)O(N^b \log^3{N}) in O(Nb+1log3N)O(N^{b+1} \log^3{N}) time, where b<1b < 1 when the query satisfies a data-dependent near-neighbor stability condition. We achieve data-dependent sublinear space by combining recent advances in locality sensitive hashing (LSH)-based estimators with compressed sensing. Our results shed new light on the memory-accuracy tradeoff for near-neighbor search. The techniques presented reveal a deep connection between the fundamental compressed sensing (or heavy hitters) recovery problem and near-neighbor search, leading to new insight for geometric search problems and implications for sketching algorithms.

View on arXiv
Comments on this paper