A Unified Framework for Clustering Constrained Data without Locality Property

In this paper, we consider a class of constrained clustering problems of points in , where could be rather high. A common feature of these problems is that their optimal clusterings no longer have the locality property (due to the additional constraints), which is a key property required by many algorithms for their unconstrained counterparts. To overcome the difficulty caused by the loss of locality, we present in this paper a unified framework, called {\em Peeling-and-Enclosing (PnE)}, to iteratively solve two variants of the constrained clustering problems, {\em constrained -means clustering} (-CMeans) and {\em constrained -median clustering} (-CMedian). Our framework is based on two standalone geometric techniques, called {\em Simplex Lemma} and {\em Weaker Simplex Lemma}, for -CMeans and -CMedian, respectively. The simplex lemma (or weaker simplex lemma) enables us to efficiently approximate the mean (or median) point of an unknown set of points by searching a small-size grid, independent of the dimensionality of the space, in a simplex (or the surrounding region of a simplex), and thus can be used to handle high dimensional data. If and are fixed numbers, our framework generates, in nearly linear time ({\em i.e.,} ), -tuple candidates for the mean or median points, and one of them induces a -approximation for -CMeans or -CMedian, where is the number of points. Combining this unified framework with a problem-specific selection algorithm (which determines the best -tuple candidate), we obtain a -approximation for each of the constrained clustering problems. We expect that our technique will be applicable to other constrained clustering problems without locality.
View on arXiv