41
5

A Strongly Consistent Sparse kk-means Clustering with Direct l1l_1 Penalization on Variable Weights

Abstract

We propose the Lasso Weighted kk-means (LWLW-kk-means) algorithm as a simple yet efficient sparse clustering procedure for high-dimensional data where the number of features (pp) can be much larger compared to the number of observations (nn). In the LWLW-kk-means algorithm, we introduce a lasso-based penalty term, directly on the feature weights to incorporate feature selection in the framework of sparse clustering. LWLW-kk-means does not make any distributional assumption of the given dataset and thus, induces a non-parametric method for feature selection. We also analytically investigate the convergence of the underlying optimization procedure in LWLW-kk-means and establish the strong consistency of our algorithm. LWLW-kk-means is tested on several real-life and synthetic datasets and through detailed experimental analysis, we find that the performance of the method is highly competitive against some state-of-the-art procedures for clustering and feature selection, not only in terms of clustering accuracy but also with respect to computational time.

View on arXiv
Comments on this paper