35
1
v1v2 (latest)

A Scalable k-Medoids Clustering via Whale Optimization Algorithm

Main:16 Pages
10 Figures
Bibliography:3 Pages
6 Tables
Abstract

Unsupervised clustering has emerged as a critical tool for uncovering hidden patterns in vast, unlabeled datasets. However, traditional methods, such as Partitioning Around Medoids (PAM), struggle with scalability owing to their quadratic computational complexity. To address this limitation, we introduce WOA-kMedoids, a novel unsupervised clustering method that incorporates the Whale Optimization Algorithm (WOA), a nature-inspired metaheuristic inspired by the hunting strategies of humpback whales. By optimizing the centroid selection, WOA-kMedoids reduces the computational complexity from quadratic to near-linear with respect to the number of observations, enabling scalability to large datasets while maintaining high clustering accuracy. We evaluated WOA-kMedoids using 25 diverse time-series datasets from the UCR archive. Our empirical results show that WOA-kMedoids achieved a clustering performance comparable to PAM, with an average Rand Index (RI) of 0.731 compared to PAM's 0.739, outperforming PAM on 12 out of 25 datasets. While exhibiting a slightly higher runtime than PAM on small datasets (<300 observations), WOA-kMedoids outperformed PAM on larger datasets, with an average speedup of 1.7x and a maximum of 2.3x. The scalability of WOA-kMedoids, combined with its high accuracy, makes them a promising choice for unsupervised clustering in big data applications. This method has implications for efficient knowledge discovery in massive unlabeled datasets, particularly where traditional k-medoids methods are computationally infeasible, including IoT anomaly detection, biomedical signal analysis, and customer behavior clustering.

View on arXiv
@article{chenan2025_2408.16993,
  title={ A Scalable k-Medoids Clustering via Whale Optimization Algorithm },
  author={ Huang Chenan and Narumasa Tsutsumida },
  journal={arXiv preprint arXiv:2408.16993},
  year={ 2025 }
}
Comments on this paper