160
v1v2v3v4 (latest)

Exponentially Consistent Nonparametric Linkage-Based Clustering of Data Sequences

IEEE Transactions on Signal Processing (IEEE TSP), 2024
Main:13 Pages
14 Figures
Bibliography:1 Pages
Abstract

In this paper, we consider nonparametric clustering of MM independent and identically distributed (i.i.d.) data sequences generated from {\em unknown} distributions. The distributions of the MM data sequences belong to KK underlying distribution clusters. Existing results on exponentially consistent nonparametric clustering algorithms, like single linkage-based (SLINK) clustering and kk-medoids distribution clustering, assume that the maximum intra-cluster distance (dLd_L) is smaller than the minimum inter-cluster distance (dHd_H). First, in the fixed sample size (FSS) setting, we show that exponential consistency can be achieved for SLINK clustering under a less strict assumption, dI<dHd_I < d_H, where dId_I is the maximum distance between any two sub-clusters of a cluster that partition the cluster. Note that dI<dLd_I < d_L in general. Thus, our results show that SLINK is exponentially consistent for a larger class of problems than previously known. In our simulations, we also identify examples where kk-medoids clustering is unable to find the true clusters, but SLINK is exponentially consistent. Then, we propose a sequential clustering algorithm, named SLINK-SEQ, based on SLINK and prove that it is also exponentially consistent. Simulation results show that the SLINK-SEQ algorithm requires fewer expected number of samples than the FSS SLINK algorithm for the same probability of error.

View on arXiv
Comments on this paper