383
v1v2 (latest)

Performance of Johnson-Lindenstrauss Transform for k-Means and k-Medians Clustering

Symposium on the Theory of Computing (STOC), 2018
Abstract

Consider an instance of Euclidean kk-means or kk-medians clustering. We show that the cost of the optimal solution is preserved up to a factor of (1+ε)(1+\varepsilon) under a projection onto a random O(log(k/ε)/ε2)O(\log(k / \varepsilon) / \varepsilon^2)-dimensional subspace. Further, the cost of every clustering is preserved within (1+ε)(1+\varepsilon). More generally, our result applies to any dimension reduction map satisfying a mild sub-Gaussian-tail condition. Our bound on the dimension is nearly optimal. Additionally, our result applies to Euclidean kk-clustering with the distances raised to the pp-th power for any constant pp. For kk-means, our result resolves an open problem posed by Cohen, Elder, Musco, Musco, and Persu (STOC 2015); for kk-medians, it answers a question raised by Kannan.

View on arXiv
Comments on this paper