Clustering under Perturbation Resilience

5 December 2011

Abstract

Recently, Bilu and Linial \cite{BL} formalized an implicit assumption often made when choosing a clustering objective: that the optimum clustering to the objective should be preserved under small multiplicative perturbations to distances between points. They showed that for max-cut clustering it is possible to circumvent NP-hardness and obtain polynomial-time algorithms for instances resilient to large (factor $O(\sqrt{n})$ ) perturbations, and subsequently Awasthi et al. \cite{ABS10} considered center-based objectives, giving algorithms for instances resilient to O(1) factor perturbations. In this paper, we greatly advance this line of work. For the $k$ -median objective, we present an algorithm that can optimally cluster instances resilient to $(1 + \sqrt{2})$ -factor perturbations, solving an open problem of Awasthi et al.\cite{ABS10}. We additionally give algorithms for a more relaxed assumption in which we allow the optimal solution to change in a small $\epsilon$ fraction of the points after perturbation. We give the first bounds known for this more realistic and more general setting. We also provide positive results for min-sum clustering which is a generally much harder objective than $k$ -median (and also non-center-based). Our algorithms are based on new linkage criteria that may be of independent interest. Additionally, we give sublinear-time algorithms, showing algorithms that can return an implicit clustering from only access to a small random sample.

View on arXiv

Comments on this paper