A Strongly Consistent Sparse $k$ -means Clustering with Direct $l_1$ Penalization on Variable Weights

24 March 2019

Abstract

We propose the Lasso Weighted $k$ -means ( $LW$ - $k$ -means) algorithm as a simple yet efficient sparse clustering procedure for high-dimensional data where the number of features ( $p$ ) can be much larger compared to the number of observations ( $n$ ). In the $LW$ - $k$ -means algorithm, we introduce a lasso-based penalty term, directly on the feature weights to incorporate feature selection in the framework of sparse clustering. $LW$ - $k$ -means does not make any distributional assumption of the given dataset and thus, induces a non-parametric method for feature selection. We also analytically investigate the convergence of the underlying optimization procedure in $LW$ - $k$ -means and establish the strong consistency of our algorithm. $LW$ - $k$ -means is tested on several real-life and synthetic datasets and through detailed experimental analysis, we find that the performance of the method is highly competitive against some state-of-the-art procedures for clustering and feature selection, not only in terms of clustering accuracy but also with respect to computational time.

View on arXiv

Comments on this paper

A Strongly Consistent Sparse kkk-means Clustering with Direct l1l_1l1​ Penalization on Variable Weights

A Strongly Consistent Sparse $k$ -means Clustering with Direct $l_1$ Penalization on Variable Weights