126

Efficient Construction of Neighborhood Graphs by the Multiple Sorting Method

Abstract

Neighborhood graphs are gaining popularity as a concise data representation in machine learning. However, naive graph construction by pairwise distance calculation takes O(n2)O(n^2) runtime for nn data points and this is prohibitively slow for millions of data points. For strings of equal length, the multiple sorting method (Uno, 2008) can construct an ϵ\epsilon-neighbor graph in O(n+m)O(n+m) time, where mm is the number of ϵ\epsilon-neighbor pairs in the data. To introduce this remarkably efficient algorithm to continuous domains such as images, signals and texts, we employ a random projection method to convert vectors to strings. Theoretical results are presented to elucidate the trade-off between approximation quality and computation time. Empirical results show the efficiency of our method in comparison to fast nearest neighbor alternatives.

View on arXiv
Comments on this paper