25
12

Under-bagging Nearest Neighbors for Imbalanced Classification

Abstract

In this paper, we propose an ensemble learning algorithm called \textit{under-bagging kk-nearest neighbors} (\textit{under-bagging kk-NN}) for imbalanced classification problems. On the theoretical side, by developing a new learning theory analysis, we show that with properly chosen parameters, i.e., the number of nearest neighbors kk, the expected sub-sample size ss, and the bagging rounds BB, optimal convergence rates for under-bagging kk-NN can be achieved under mild assumptions w.r.t.~the arithmetic mean (AM) of recalls. Moreover, we show that with a relatively small BB, the expected sub-sample size ss can be much smaller than the number of training data nn at each bagging round, and the number of nearest neighbors kk can be reduced simultaneously, especially when the data are highly imbalanced, which leads to substantially lower time complexity and roughly the same space complexity. On the practical side, we conduct numerical experiments to verify the theoretical results on the benefits of the under-bagging technique by the promising AM performance and efficiency of our proposed algorithm.

View on arXiv
Comments on this paper