41
4

DISCERN: Diversity-based Selection of Centroids for k-Estimation and Rapid Non-stochastic Clustering

Abstract

Clustering algorithms are considered an important subset of unsupervised learning methods. These algorithms require parameters such as the number of clusters or neighborhood size and radius which are usually unknown and may even be hard to estimate. Moreover, some of the most efficient clustering algorithms such as K-Means are stochastic and at times not robust. In order to address such issues, we propose DISCERN, which can serve as an initialization algorithm for K-Means, finding suitable centroids that increase the performance of K-Means. The algorithm is also designed to estimate the number of clusters, which is its only parameter, and does not require stochastic initialization. We ran experiments on the proposed method processing multiple types of datasets and the results show its undeniable superiority in terms of results and robustness when compared to other methods. In addition, the superiority in estimating the number of clusters is also discussed as well as lower computational complexity in this estimation.

View on arXiv
Comments on this paper