Empirical complexity of comparator-based nearest neighbor descent

30 January 2022

Abstract

A Java parallel streams implementation of the $K$ -nearest neighbor descent algorithm is presented using a natural statistical termination criterion. Input data consist of a set $S$ of $n$ objects of type V, and a Function<V, Comparator<V>>, which enables any $x \in S$ to decide which of $y, z \in S\setminus\{x\}$ is more similar to $x$ . Experiments with the Kullback-Leibler divergence Comparator support the prediction that the number of rounds of $K$ -nearest neighbor updates need not exceed twice the diameter of the undirected version of a random regular out-degree $K$ digraph on $n$ vertices. Overall complexity was $O(n K^2 \log_K(n))$ in the class of examples studied. When objects are sampled uniformly from a $d$ -dimensional simplex, accuracy of the $K$ -nearest neighbor approximation is high up to $d = 20$ , but declines in higher dimensions, as theory would predict.

View on arXiv

Comments on this paper