77
148

Approximate Nearest Neighbor Search in High Dimensions

Abstract

The nearest neighbor problem is defined as follows: Given a set PP of nn points in some metric space (X,D)(X,D), build a data structure that, given any point qq, returns a point in PP that is closest to qq (its "nearest neighbor" in PP). The data structure stores additional information about the set PP, which is then used to find the nearest neighbor without computing all distances between qq and PP. The problem has a wide range of applications in machine learning, computer vision, databases and other fields. To reduce the time needed to find nearest neighbors and the amount of memory used by the data structure, one can formulate the {\em approximate} nearest neighbor problem, where the the goal is to return any point pPp' \in P such that the distance from qq to pp' is at most cminpPD(q,p)c \cdot \min_{p \in P} D(q,p), for some c1c \geq 1. Over the last two decades, many efficient solutions to this problem were developed. In this article we survey these developments, as well as their connections to questions in geometric functional analysis and combinatorial geometry.

View on arXiv
Comments on this paper