20
25

Near-optimal Algorithms for Explainable k-Medians and k-Means

Abstract

We consider the problem of explainable kk-medians and kk-means introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian~(ICML 2020). In this problem, our goal is to find a threshold decision tree that partitions data into kk clusters and minimizes the kk-medians or kk-means objective. The obtained clustering is easy to interpret because every decision node of a threshold tree splits data based on a single feature into two groups. We propose a new algorithm for this problem which is O~(logk)\tilde O(\log k) competitive with kk-medians with 1\ell_1 norm and O~(k)\tilde O(k) competitive with kk-means. This is an improvement over the previous guarantees of O(k)O(k) and O(k2)O(k^2) by Dasgupta et al (2020). We also provide a new algorithm which is O(log3/2k)O(\log^{3/2} k) competitive for kk-medians with 2\ell_2 norm. Our first algorithm is near-optimal: Dasgupta et al (2020) showed a lower bound of Ω(logk)\Omega(\log k) for kk-medians; in this work, we prove a lower bound of Ω~(k)\tilde\Omega(k) for kk-means. We also provide a lower bound of Ω(logk)\Omega(\log k) for kk-medians with 2\ell_2 norm.

View on arXiv
Comments on this paper