Correlation Clustering with Low-Rank Matrices

Correlation clustering is a technique for aggregating data based on qualitative information about which pairs of objects are labeled 'similar' or 'dissimilar.' Because the optimization problem is NP-hard, much of the previous literature focuses on finding approximation algorithms. In this paper we explore a new approach to correlation clustering by considering how to solve the problem when the data to be clustered can be represented by a low-rank matrix. Many real-world datasets are known to be inherently low-dimensional, and our goal is to establish a tractable approach to correlation clustering in this important setting. We prove in particular that correlation clustering can be solved in polynomial time when the underlying matrix is positive semidefinite with small constant rank, but that the task remains NP-hard in the presence of even one negative eigenvalue. Based on our theoretical results, we develop an algorithm for efficiently solving low-rank positive semidefinite correlation clustering by employing a procedure for zonotope vertex enumeration. We demonstrate the effectiveness and speed of our algorithm by using it to solve several clustering problems on both synthetic and real-world data.
View on arXiv