Kernel K-means clustering of distributional data

22 September 2025

Amparo Baíllo

Jose R. Berrendero

Martín Sánchez-Signorini

ArXiv (abs)PDF HTML Github

Main:14 Pages

3 Figures

Bibliography:2 Pages

15 Tables

Appendix:8 Pages

Abstract

We consider the problem of clustering a sample of probability distributions from a random distribution on $\mathbb R^p$ . Our proposed partitioning method makes use of a symmetric, positive-definite kernel $k$ and its associated reproducing kernel Hilbert space (RKHS) $\mathcal H$ . By mapping each distribution to its corresponding kernel mean embedding in $\mathcal H$ , we obtain a sample in this RKHS where we carry out the $K$ -means clustering procedure, which provides an unsupervised classification of the original sample. The procedure is simple and computationally feasible even for dimension $p>1$ . The simulation studies provide insight into the choice of the kernel and its tuning parameter. The performance of the proposed clustering procedure is illustrated on a collection of Synthetic Aperture Radar (SAR) images.

View on arXiv

Comments on this paper