A semi-supervised sparse K-Means algorithm

Pattern Recognition Letters (Pattern Recognit. Lett.), 2020

16 March 2020

Abstract

We consider the problem of data clustering with unidentified feature quality but with the existence of small amount of labelled data. In the first case a sparse clustering method can be employed in order to detect the subgroup of features necessary for clustering and in the second case a semi-supervised method can use the labelled data to create constraints and enhance the clustering solution. In this paper we propose a K-Means inspired algorithm that employs these techniques. We show that the algorithm maintains the high performance of other semi-supervised algorithms and also preserves the ability to identify informative from uninformative features. We examine the performance of the algorithm on synthetic and real world data sets. We use a series of scenarios with different number and types of constraints as well as two different clustering initialisation methods.

View on arXiv

Comments on this paper