68

Multi-Swap kk-Means++

Neural Information Processing Systems (NeurIPS), 2023
Abstract

The kk-means++ algorithm of Arthur and Vassilvitskii (SODA 2007) is often the practitioners' choice algorithm for optimizing the popular kk-means clustering objective and is known to give an O(logk)O(\log k)-approximation in expectation. To obtain higher quality solutions, Lattanzi and Sohler (ICML 2019) proposed augmenting kk-means++ with O(kloglogk)O(k \log \log k) local search steps obtained through the kk-means++ sampling distribution to yield a cc-approximation to the kk-means clustering problem, where cc is a large absolute constant. Here we generalize and extend their local search algorithm by considering larger and more sophisticated local search neighborhoods hence allowing to swap multiple centers at the same time. Our algorithm achieves a 9+ε9 + \varepsilon approximation ratio, which is the best possible for local search. Importantly we show that our approach yields substantial practical improvements, we show significant quality improvements over the approach of Lattanzi and Sohler (ICML 2019) on several datasets.

View on arXiv
Comments on this paper