8
6

Empirical complexity of comparator-based nearest neighbor descent

Abstract

A Java parallel streams implementation of the KK-nearest neighbor descent algorithm is presented using a natural statistical termination criterion. Input data consist of a set SS of nn objects of type V, and a Function<V, Comparator<V>>, which enables any xSx \in S to decide which of y,zS{x}y, z \in S\setminus\{x\} is more similar to xx. Experiments with the Kullback-Leibler divergence Comparator support the prediction that the number of rounds of KK-nearest neighbor updates need not exceed twice the diameter of the undirected version of a random regular out-degree KK digraph on nn vertices. Overall complexity was O(nK2logK(n))O(n K^2 \log_K(n)) in the class of examples studied. When objects are sampled uniformly from a dd-dimensional simplex, accuracy of the KK-nearest neighbor approximation is high up to d=20d = 20, but declines in higher dimensions, as theory would predict.

View on arXiv
Comments on this paper