379

The Highest Dimensional Stochastic Blockmodel with a Regularized Estimator

Abstract

This paper advances the high dimensional frontier for network clustering. In the high dimensional Stochastic Blockmodel for a random network, the number of clusters (or blocks) K grows with the number of nodes N. Previous authors have studied the statistical estimation performance of spectral clustering and the maximum likelihood estimator under the high dimensional model. These authors do not allow K to grow faster than N^{1/2}. We study a model where, ignoring log terms, K can grow proportionally to N. Since the number of clusters must be smaller than the number of nodes, no reasonable model allows K to grow faster; thus, our asymptotic results are the "highest" dimensional. To push the asymptotic setting to this extreme, we develop a regularized maximum likelihood estimator. We prove that, under certain conditions, the proportion of nodes that the regularized estimator misclusters converges to zero. This is the first paper to explicitly introduce and demonstrate the advantages of statistical regularization for network analysis. Empirical observation in physical anthropology and an in depth study of massive empirical networks by motivate both our asymptotic setting and regularized estimator.

View on arXiv
Comments on this paper