136
88

Clustering under Perturbation Resilience

Abstract

Recently, Bilu and Linial formalized an implicit assumption often made when choosing a clustering objective: that the optimum clustering to the objective should be preserved under small multiplicative perturbations to distances between points. They showed that for max-cut clustering it is possible to circumvent NP-hardness and obtain polynomial-time algorithms for instances resilient to large (factor O(n)O(\sqrt{n})) perturbations, and subsequently Awasthi et al. considered center-based objectives, giving algorithms for instances resilient to O(1) factor perturbations. In this paper, we greatly advance this line of work. For center-based objectives, we present an algorithm that can optimally cluster instances resilient to (1+2)(1 + \sqrt{2})-factor perturbations, solving an open problem of Awasthi et al. For a commonly used center-based objective kk-median, we additionally give algorithms for a more relaxed assumption in which we allow the optimal solution to change in a small ϵ\epsilon fraction of the points after perturbation. We give the first bounds known for this more realistic and more general setting. We also provide positive results for min-sum clustering which is a generally much harder objective than kk-median (and also non-center-based). Our algorithms are based on new linkage criteria that may be of independent interest. Additionally, we give sublinear-time algorithms, showing algorithms that can return an implicit clustering from only access to a small random sample.

View on arXiv
Comments on this paper