124
37

Consistent clustering using 1\ell_1 fusion penalty

Abstract

We study a convex regularized clustering framework that minimizes the within cluster sum of squares under an~1\ell_1 fusion constraint on the cluster centroids. We track the entire solution path through a regularization path algorithm. Analyzing the associated population clustering procedure, we provide new insights on how the~1\ell_1 fusion regularization incrementally induces partitions in the sample space. Based on these new perspectives, we propose a refined path algorithm, which in large samples can consistently detect the number of clusters and the associated partition of the space. Our method of analysis is fairly general and works for a wide range of population densities. Explicit characterization of the consistency conditions is provided for the case of Gaussian mixtures. On simulated data sets, we compare the performance of our method with a number of existing cluster estimation and modality assessment algorithms, and obtain encouraging results. We also demonstrate the applicability of our clustering approach for the detection of cellular subpopulations in a single-cell protein expression based virology study.

View on arXiv
Comments on this paper