205

Validation of k-Nearest Neighbor Classifiers Using Inclusion and Exclusion

Abstract

This paper presents a series of PAC error bounds for kk-nearest neighbors classifiers, with O(nr2r+1n^{-\frac{r}{2r+1}}) expected range in the difference between error bound and actual error rate, for each integer r>0r>0, where nn is the number of in-sample examples. The best previous expected bound range was O(n25n^{-\frac{2}{5}}). The result shows that kk-nn classifiers, in spite of their famously fractured decision boundaries, come arbitrarily close to having Gaussian-style O(n12n^{-\frac{1}{2}}) expected differences between PAC (probably approximately correct) error bounds and actual expected out-of-sample error rates.

View on arXiv
Comments on this paper