322

Semidefinite Programs for Exact Recovery of a Hidden Community

Abstract

We study a semidefinite programming (SDP) relaxation of the maximum likelihood estimation for exactly recovering a hidden community of cardinality KK from an n×nn \times n symmetric data matrix AA, where for distinct indices i,ji,j, AijPA_{ij} \sim P if i,ji, j are both in the community and AijQA_{ij} \sim Q otherwise, for two known probability distributions PP and QQ. We identify a sufficient condition and a necessary condition for the success of SDP for the general model. For both the Bernoulli case (P=Bern(p)P={\rm Bern}(p) and Q=Bern(q)Q={\rm Bern}(q) with p>qp>q) and the Gaussian case (P=N(μ,1)P=\mathcal{N}(\mu,1) and Q=N(0,1)Q=\mathcal{N}(0,1) with μ>0\mu>0), which correspond to the problem of planted dense subgraph recovery and submatrix localization respectively, the general results lead to the following findings: (1) If K=ω(n/logn)K=\omega( n /\log n), SDP attains the information-theoretic recovery limits with sharp constants; (2) If K=Θ(n/logn)K=\Theta(n/\log n), SDP is order-wise optimal, but strictly suboptimal by a constant factor; (3) If K=o(n/logn)K=o(n/\log n) and KK \to \infty, SDP is order-wise suboptimal. A key ingredient in the proof of the necessary condition is a construction of a primal feasible solution based on random perturbation of the true cluster matrix.

View on arXiv
Comments on this paper