A semi-supervised sparse K-Means algorithm
We consider the problem of data clustering with unidentified feature quality but with the existence of small amount of labelled data. In the first case a sparse clustering method can be employed in order to detect the subgroup of features necessary for clustering and in the second case a semi-supervised method can use the labelled data to create constraints and enhance the clustering solution. In this paper we propose a K-Means inspired algorithm that employs these techniques. We show that the algorithm maintains the high performance of other semi-supervised algorithms and also preserves the ability to identify informative from uninformative features. We examine the performance of the algorithm on synthetic and real world data sets. We use a series of scenarios with different number and types of constraints as well as two different clustering initialisation methods.
View on arXiv