Mini-Batch Kernel -means

We present the first mini-batch kernel -means algorithm, offering an order of magnitude improvement in running time compared to the full batch algorithm. A single iteration of our algorithm takes time, significantly faster than the time required by the full batch kernel -means, where is the dataset size and is the batch size. Extensive experiments demonstrate that our algorithm consistently achieves a 10-100x speedup with minimal loss in quality, addressing the slow runtime that has limited kernel -means adoption in practice. We further complement these results with a theoretical analysis under an early stopping condition, proving that with a batch size of , the algorithm terminates in iterations with high probability, where bounds the norm of points in feature space and is a termination threshold. Our analysis holds for any reasonable center initialization, and when using -means++ initialization, the algorithm achieves an approximation ratio of in expectation. For normalized kernels, such as Gaussian or Laplacian it holds that . Taking and , the algorithm terminates in iterations, with each iteration running in time.
View on arXiv