Streaming Binary Sketching based on Subspace Tracking and Diagonal
Uniformization
In this paper, we address the problem of learning compact similarity-preserving embeddings for massive high-dimensional streams of data in order to perform efficient similarity search. We present a new method for computing binary compressed representations -\textit{sketches}- of high-dimensional real feature vectors. Given an expected code length and high-dimensional input data points, our algorithm provides a binary code of bits aiming at preserving the distance between the points from the original high-dimensional space. Our offline version of the algorithm outperforms the offline state-of-the-art methods regarding their computation time complexity and have a similar quality of the sketches. It also provides convergence guarantees. Moreover, our algorithm can be straightforwardly used in the streaming context by not requiring neither the storage of the whole dataset nor a chunk. We demonstrate the quality of our binary sketches through extensive experiments on real data for the nearest neighbors search task in the offline and online settings.
View on arXiv