On Euclidean -Means Clustering with -Center Proximity

-means clustering is NP-hard in the worst case but previous work has shown efficient algorithms assuming the optimal -means clusters are \emph{stable} under additive or multiplicative perturbation of data. This has two caveats. First, we do not know how to efficiently verify this property of optimal solutions that are NP-hard to compute in the first place. Second, the stability assumptions required for polynomial time -means algorithms are often unreasonable when compared to the ground-truth clusters in real-world data. A consequence of multiplicative perturbation resilience is \emph{center proximity}, that is, every point is closer to the center of its own cluster than the center of any other cluster, by some multiplicative factor . We study the problem of minimizing the Euclidean -means objective only over clusterings that satisfy -center proximity. We give a simple algorithm to find the optimal -center-proximal -means clustering in running time exponential in and but linear in the number of points and the dimension. We define an analogous -center proximity condition for outliers, and give similar algorithmic guarantees for -means with outliers and -center proximity. On the hardness side we show that for any , there exists an , , and an such that minimizing the -means objective over clusterings that satisfy -center proximity is NP-hard to approximate within a multiplicative factor.
View on arXiv