108

A Constant-Factor Bi-Criteria Approximation Guarantee for kk-means++

Abstract

This paper studies the kk-means++ algorithm for clustering as well as the class of DD^\ell sampling algorithms to which kk-means++ belongs. It is shown that for any constant factor β>1\beta > 1, selecting βk\beta k cluster centers by DD^\ell sampling yields a constant-factor approximation to the optimal clustering with kk centers, in expectation and without conditions on the dataset. This result extends the previously known O(logk)O(\log k) guarantee for the case β=1\beta = 1 to the constant-factor bi-criteria regime. It also improves upon an existing constant-factor bi-criteria result that holds only with constant probability.

View on arXiv
Comments on this paper