11
26

Submatrix localization via message passing

Abstract

The principal submatrix localization problem deals with recovering a K×KK\times K principal submatrix of elevated mean μ\mu in a large n×nn\times n symmetric matrix subject to additive standard Gaussian noise. This problem serves as a prototypical example for community detection, in which the community corresponds to the support of the submatrix. The main result of this paper is that in the regime Ω(n)Ko(n)\Omega(\sqrt{n}) \leq K \leq o(n), the support of the submatrix can be weakly recovered (with o(K)o(K) misclassification errors on average) by an optimized message passing algorithm if λ=μ2K2/n\lambda = \mu^2K^2/n, the signal-to-noise ratio, exceeds 1/e1/e. This extends a result by Deshpande and Montanari previously obtained for K=Θ(n).K=\Theta(\sqrt{n}). In addition, the algorithm can be extended to provide exact recovery whenever information-theoretically possible and achieve the information limit of exact recovery as long as Knlogn(18e+o(1))K \geq \frac{n}{\log n} (\frac{1}{8e} + o(1)). The total running time of the algorithm is O(n2logn)O(n^2\log n). Another version of the submatrix localization problem, known as noisy biclustering, aims to recover a K1×K2K_1\times K_2 submatrix of elevated mean μ\mu in a large n1×n2n_1\times n_2 Gaussian matrix. The optimized message passing algorithm and its analysis are adapted to the bicluster problem assuming Ω(ni)Kio(ni)\Omega(\sqrt{n_i}) \leq K_i \leq o(n_i) and K1K2.K_1\asymp K_2. A sharp information-theoretic condition for the weak recovery of both clusters is also identified.

View on arXiv
Comments on this paper