Clustering is Efficient for Approximate Maximum Inner Product Search

21 July 2015

Abstract

Locality Sensitive Hashing (LSH) techniques have recently become a popular solution for solving the approximate Maximum Inner Product Search (MIPS) problem, which arises in many situations and have in particular been used as a speed-up for the training of large neural probabilistic language models. In this paper we propose a new approach for solving approximate MIPS based on a variant of the $k$ -means algorithm. We suggest using spherical $k$ -means which is an algorithm that can efficiently be used to solve the approximate Maximum Cosine Similarity Search (MCSS), and basing ourselves on previous work by Shrivastava and Li we show how it can be adapted for approximate MIPS. Our new method compares favorably with LSH-based methods on a simple recall rate test, by providing a more accurate set of candidates for the maximum inner product. The proposed method is thus likely to benefit the wide range of problems with very large search spaces where a robust approximate MIPS heuristic could be of interest, such as for providing a high quality short list of candidate words to speed up the training of neural probabilistic language models.

View on arXiv

Comments on this paper