Kernel K-means clustering of distributional data
We consider the problem of clustering a sample of probability distributions from a random distribution on . Our proposed partitioning method makes use of a symmetric, positive-definite kernel and its associated reproducing kernel Hilbert space (RKHS) . By mapping each distribution to its corresponding kernel mean embedding in , we obtain a sample in this RKHS where we carry out the -means clustering procedure, which provides an unsupervised classification of the original sample. The procedure is simple and computationally feasible even for dimension . The simulation studies provide insight into the choice of the kernel and its tuning parameter. The performance of the proposed clustering procedure is illustrated on a collection of Synthetic Aperture Radar (SAR) images.
View on arXiv