26
4
v1v2v3 (latest)

On Euclidean kk-Means Clustering with αα-Center Proximity

Abstract

kk-means clustering is NP-hard in the worst case but previous work has shown efficient algorithms assuming the optimal kk-means clusters are \emph{stable} under additive or multiplicative perturbation of data. This has two caveats. First, we do not know how to efficiently verify this property of optimal solutions that are NP-hard to compute in the first place. Second, the stability assumptions required for polynomial time kk-means algorithms are often unreasonable when compared to the ground-truth clusters in real-world data. A consequence of multiplicative perturbation resilience is \emph{center proximity}, that is, every point is closer to the center of its own cluster than the center of any other cluster, by some multiplicative factor α>1\alpha > 1. We study the problem of minimizing the Euclidean kk-means objective only over clusterings that satisfy α\alpha-center proximity. We give a simple algorithm to find the optimal α\alpha-center-proximal kk-means clustering in running time exponential in kk and 1/(α1)1/(\alpha - 1) but linear in the number of points and the dimension. We define an analogous α\alpha-center proximity condition for outliers, and give similar algorithmic guarantees for kk-means with outliers and α\alpha-center proximity. On the hardness side we show that for any α>1\alpha' > 1, there exists an αα\alpha \leq \alpha', (α>1)(\alpha >1), and an ε0>0\varepsilon_0 > 0 such that minimizing the kk-means objective over clusterings that satisfy α\alpha-center proximity is NP-hard to approximate within a multiplicative (1+ε0)(1+\varepsilon_0) factor.

View on arXiv
Comments on this paper