Community Detection in the Hypergraph SBM: Optimal Recovery Given the
Similarity Matrix
Community detection is a fundamental problem in network science. In this paper, we consider community detection in hypergraphs drawn from the (HSBM), with a focus on exact community recovery. We study the performance of polynomial-time algorithms which operate on the , where reports the number of hyperedges containing both and . Under this information model, Kim, Bandeira, and Goemans determined the information-theoretic threshold for exact recovery in the logarithmic degree regime, and proposed a semidefinite programming relaxation which they conjectured to be optimal. In this paper, we confirm this conjecture. We also design a simple and highly efficient spectral algorithm with nearly linear runtime and show that it achieves the information-theoretic threshold. Moreover, the spectral algorithm also succeeds in denser regimes and is considerably more efficient than previous approaches, establishing it as the method of choice. Our analysis of the spectral algorithm crucially relies on strong bounds on the eigenvectors of . Our bounds are inspired by the work of Abbe, Fan, Wang, and Zhong, who developed entrywise bounds for eigenvectors of symmetric matrices with independent entries. Despite the complex dependency structure in similarity matrices, we prove similar entrywise guarantees.
View on arXiv