33
7

Bipartite Correlation Clustering -- Maximizing Agreements

Abstract

In Bipartite Correlation Clustering (BCC) we are given a complete bipartite graph GG with `+' and `-' edges, and we seek a vertex clustering that maximizes the number of agreements: the number of all `+' edges within clusters plus all `-' edges cut across clusters. BCC is known to be NP-hard. We present a novel approximation algorithm for kk-BCC, a variant of BCC with an upper bound kk on the number of clusters. Our algorithm outputs a kk-clustering that provably achieves a number of agreements within a multiplicative (1δ){(1-\delta)}-factor from the optimal, for any desired accuracy δ\delta. It relies on solving a combinatorially constrained bilinear maximization on the bi-adjacency matrix of GG. It runs in time exponential in kk and δ1\delta^{-1}, but linear in the size of the input. Further, we show that, in the (unconstrained) BCC setting, an (1δ){(1-\delta)}-approximation can be achieved by O(δ1)O(\delta^{-1}) clusters regardless of the size of the graph. In turn, our kk-BCC algorithm implies an Efficient PTAS for the BCC objective of maximizing agreements.

View on arXiv
Comments on this paper