29
32

Semidefinite Programs for Exact Recovery of a Hidden Community

Abstract

We study a semidefinite programming (SDP) relaxation of the maximum likelihood estimation for exactly recovering a hidden community of cardinality KK from an n×nn \times n symmetric data matrix AA, where for distinct indices i,ji,j, AijPA_{ij} \sim P if i,ji, j are both in the community and AijQA_{ij} \sim Q otherwise, for two known probability distributions PP and QQ. We identify a sufficient condition and a necessary condition for the success of SDP for the general model. For both the Bernoulli case (P=Bern(p)P={{\rm Bern}}(p) and Q=Bern(q)Q={{\rm Bern}}(q) with p>qp>q) and the Gaussian case (P=N(μ,1)P=\mathcal{N}(\mu,1) and Q=N(0,1)Q=\mathcal{N}(0,1) with μ>0\mu>0), which correspond to the problem of planted dense subgraph recovery and submatrix localization respectively, the general results lead to the following findings: (1) If K=ω(n/logn)K=\omega( n /\log n), SDP attains the information-theoretic recovery limits with sharp constants; (2) If K=Θ(n/logn)K=\Theta(n/\log n), SDP is order-wise optimal, but strictly suboptimal by a constant factor; (3) If K=o(n/logn)K=o(n/\log n) and KK \to \infty, SDP is order-wise suboptimal. The same critical scaling for KK is found to hold, up to constant factors, for the performance of SDP on the stochastic block model of nn vertices partitioned into multiple communities of equal size KK. A key ingredient in the proof of the necessary condition is a construction of a primal feasible solution based on random perturbation of the true cluster matrix.

View on arXiv
Comments on this paper