Provably noise-robust, regularised -means clustering

We consider the problem of clustering in the presence of noise. That is, when on top of cluster structure, the data also contains a subset of points which are unstructured. Our goal is to detect this structure despite the presence of the unstructured points. Any algorithm which achieves this goal is noise-robust. We consider a regularization method which converts any center-based clustering objective into a noise-robust one and provide robustness guarantees for our method. More specifically, in this paper, we focus on the popular -means objective. We first show that the regularised version of -means is also NP-Hard. We propose an algorithm based on the convex (sdp) relaxation of the regularised objective. We then prove robustness guarantee for our sdp-based algorithm w.r.t existing robustness measures. We complement our findings with experiments showing the efficiency of our algorithm.
View on arXiv