ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.06975
13
23

Wasserstein KKK-means for clustering probability distributions

14 September 2022
Yubo Zhuang
Xiaohui Chen
Yun Yang
ArXivPDFHTML
Abstract

Clustering is an important exploratory data analysis technique to group objects based on their similarity. The widely used KKK-means clustering method relies on some notion of distance to partition data into a fewer number of groups. In the Euclidean space, centroid-based and distance-based formulations of the KKK-means are equivalent. In modern machine learning applications, data often arise as probability distributions and a natural generalization to handle measure-valued data is to use the optimal transport metric. Due to non-negative Alexandrov curvature of the Wasserstein space, barycenters suffer from regularity and non-robustness issues. The peculiar behaviors of Wasserstein barycenters may make the centroid-based formulation fail to represent the within-cluster data points, while the more direct distance-based KKK-means approach and its semidefinite program (SDP) relaxation are capable of recovering the true cluster labels. In the special case of clustering Gaussian distributions, we show that the SDP relaxed Wasserstein KKK-means can achieve exact recovery given the clusters are well-separated under the 222-Wasserstein metric. Our simulation and real data examples also demonstrate that distance-based KKK-means can achieve better classification performance over the standard centroid-based KKK-means for clustering probability distributions and images.

View on arXiv
Comments on this paper