Community estimation in $G$ -models via CORD

8 August 2015

Abstract

Given a zero mean random vector ${\bf X}=:(X_1,\ldots,X_p)\in R^p$ , we consider the problem of defining and estimating a partition $G$ of $\{1,\ldots,p\}$ such that the components of ${\bf X}$ with indices in the same group of the partition have a similar, community-like behavior. We introduce a new model, the $G$ -exchangeable model, to define group similarity. This model is a natural extension of the more commonly used $G$ -latent model, for which the partition $G$ is generally not identifiable, without additional restrictions on ${\bf X}$ . In contrast, we show that for any random vector ${\bf X}$ there exists an identifiable partition $G$ according to which ${\bf X}$ is $G$ -exchangeable, thereby providing a clear target for community estimation. Moreover, we provide another model, the $G$ -block covariance model, which generalizes the $G$ -exchangeable model, and can be of interest in its own right for defining group similarity. We discuss connections between the three types of $G$ -models. We exploit the connection with $G$ -block covariance models to develop a new metric, CORD, and a homonymous method for community estimation. We specialize and analyze our method for Gaussian copula data. We show that this method recovers the partition according to which ${\bf X}$ is $G$ -exchangeable with a $G$ -block copula correlation matrix. In the particular case of Gaussian distributions, this estimator, under mild assumptions, identifies the unique minimal partition according to the $G$ -latent model. The CORD estimator is consistent as long as the communities are separated at a rate that we prove to be minimax optimal, via lower bound calculations. Our procedure is fast and extensive numerical studies show that it recovers communities defined by our models, while existing variable clustering algorithms typically fail to do so. This is further supported by two real-data examples.