172

No-Substitution kk-means Clustering with Optimal Center Complexity and Low Memory

International Conference on Algorithmic Learning Theory (ALT), 2021
Abstract

We consider kk-means clustering in the online no-substitution setting where one must decide whether to take each data point xtx_t as a center immediately upon streaming it and cannot remove centers once taken. Our work is focused on the \emph{arbitrary-order} assumption where there are no restrictions on how the points XX are ordered or generated. Algorithms in this setting are evaluated with respect to their approximation ratio compared to optimal clustering cost, the number of centers they select, and their memory usage. Recently, Bhattacharjee and Moshkovitz (2020) defined a parameter, Lowerα,k(X)Lower_{\alpha, k}(X) that governs the minimum number of centers any α\alpha-approximation clustering algorithm, allowed any amount of memory, must take given input XX. To complement their result, we give the first algorithm that takes O~(Lowerα,k(X))\tilde{O}(Lower_{\alpha,k}(X)) centers (hiding factors of k,lognk, \log n) while simultaneously achieving a constant approximation and using O~(k)\tilde{O}(k) memory in addition to the memory required to save the centers. Our algorithm shows that it in the no-substitution setting, it is possible to take an order-optimal number of centers while using little additional memory.

View on arXiv
Comments on this paper