121

Enhancing Kernel Power K-means: Scalable and Robust Clustering with Random Fourier Features and Possibilistic Method

Main:7 Pages
5 Figures
Bibliography:2 Pages
6 Tables
Appendix:8 Pages
Abstract

Kernel power kk-means (KPKM) leverages a family of means to mitigate local minima issues in kernel kk-means. However, KPKM faces two key limitations: (1) the computational burden of the full kernel matrix restricts its use on extensive data, and (2) the lack of authentic centroid-sample assignment learning reduces its noise robustness. To overcome these challenges, we propose RFF-KPKM, introducing the first approximation theory for applying random Fourier features (RFF) to KPKM. RFF-KPKM employs RFF to generate efficient, low-dimensional feature maps, bypassing the need for the whole kernel matrix. Crucially, we are the first to establish strong theoretical guarantees for this combination: (1) an excess risk bound of O(k3/n)\mathcal{O}(\sqrt{k^3/n}), (2) strong consistency with membership values, and (3) a (1+ε)(1+\varepsilon) relative error bound achievable using the RFF of dimension poly(ε1logk)\mathrm{poly}(\varepsilon^{-1}\log k). Furthermore, to improve robustness and the ability to learn multiple kernels, we propose IP-RFF-MKPKM, an improved possibilistic RFF-based multiple kernel power kk-means. IP-RFF-MKPKM ensures the scalability of MKPKM via RFF and refines cluster assignments by combining the merits of the possibilistic membership and fuzzy membership. Experiments on large-scale datasets demonstrate the superior efficiency and clustering accuracy of the proposed methods compared to the state-of-the-art alternatives.

View on arXiv
Comments on this paper