63
1

Mini-Batch Kernel kk-means

Abstract

We present the first mini-batch kernel kk-means algorithm, offering an order of magnitude improvement in running time compared to the full batch algorithm. A single iteration of our algorithm takes O~(kb2)\widetilde{O}(kb^2) time, significantly faster than the O(n2)O(n^2) time required by the full batch kernel kk-means, where nn is the dataset size and bb is the batch size. Extensive experiments demonstrate that our algorithm consistently achieves a 10-100x speedup with minimal loss in quality, addressing the slow runtime that has limited kernel kk-means adoption in practice. We further complement these results with a theoretical analysis under an early stopping condition, proving that with a batch size of Ω~(max{γ4,γ2}ϵ2)\widetilde{\Omega}(\max \{\gamma^{4}, \gamma^{2}\} \cdot \epsilon^{-2}), the algorithm terminates in O(γ2/ϵ)O(\gamma^2/\epsilon) iterations with high probability, where γ\gamma bounds the norm of points in feature space and ϵ\epsilon is a termination threshold. Our analysis holds for any reasonable center initialization, and when using kk-means++ initialization, the algorithm achieves an approximation ratio of O(logk)O(\log k) in expectation. For normalized kernels, such as Gaussian or Laplacian it holds that γ=1\gamma=1. Taking ϵ=O(1)\epsilon = O(1) and b=Θ(logn)b=\Theta(\log n), the algorithm terminates in O(1)O(1) iterations, with each iteration running in O~(k)\widetilde{O}(k) time.

View on arXiv
Comments on this paper