v1v2v3 (latest)

On Euclidean $k$ -Means Clustering with $α$ -Center Proximity

28 April 2018

Abstract

$k$ -means clustering is NP-hard in the worst case but previous work has shown efficient algorithms assuming the optimal $k$ -means clusters are \emph{stable} under additive or multiplicative perturbation of data. This has two caveats. First, we do not know how to efficiently verify this property of optimal solutions that are NP-hard to compute in the first place. Second, the stability assumptions required for polynomial time $k$ -means algorithms are often unreasonable when compared to the ground-truth clusters in real-world data. A consequence of multiplicative perturbation resilience is \emph{center proximity}, that is, every point is closer to the center of its own cluster than the center of any other cluster, by some multiplicative factor $\alpha > 1$ . We study the problem of minimizing the Euclidean $k$ -means objective only over clusterings that satisfy $\alpha$ -center proximity. We give a simple algorithm to find the optimal $\alpha$ -center-proximal $k$ -means clustering in running time exponential in $k$ and $1/(\alpha - 1)$ but linear in the number of points and the dimension. We define an analogous $\alpha$ -center proximity condition for outliers, and give similar algorithmic guarantees for $k$ -means with outliers and $\alpha$ -center proximity. On the hardness side we show that for any $\alpha' > 1$ , there exists an $\alpha \leq \alpha'$ , $(\alpha >1)$ , and an $\varepsilon_0 > 0$ such that minimizing the $k$ -means objective over clusterings that satisfy $\alpha$ -center proximity is NP-hard to approximate within a multiplicative $(1+\varepsilon_0)$ factor.

View on arXiv

Comments on this paper

On Euclidean kkk-Means Clustering with ααα-Center Proximity

On Euclidean $k$ -Means Clustering with $α$ -Center Proximity