386

Community estimation in GG-models via CORD

Abstract

Given a zero mean random vector X=:(X1,,Xp)Rp{\bf X}=:(X_1,\ldots,X_p)\in R^p, we consider the problem of defining and estimating a partition GG of {1,,p}\{1,\ldots,p\} such that the components of X{\bf X} with indices in the same group of the partition have a similar, community-like behavior. We introduce a new model, the GG-exchangeable model, to define group similarity. This model is a natural extension of the more commonly used GG-latent model, for which the partition GG is generally not identifiable, without additional restrictions on X{\bf X}. In contrast, we show that for any random vector X{\bf X} there exists an identifiable partition GG according to which X{\bf X} is GG-exchangeable, thereby providing a clear target for community estimation. Moreover, we provide another model, the GG-block covariance model, which generalizes the GG-exchangeable model, and can be of interest in its own right for defining group similarity. We discuss connections between the three types of GG-models. We exploit the connection with GG-block covariance models to develop a new metric, CORD, and a homonymous method for community estimation. We specialize and analyze our method for Gaussian copula data. We show that this method recovers the partition according to which X{\bf X} is GG-exchangeable with a GG-block copula correlation matrix. In the particular case of Gaussian distributions, this estimator, under mild assumptions, identifies the unique minimal partition according to the GG-latent model. The CORD estimator is consistent as long as the communities are separated at a rate that we prove to be minimax optimal, via lower bound calculations. Our procedure is fast and extensive numerical studies show that it recovers communities defined by our models, while existing variable clustering algorithms typically fail to do so. This is further supported by two real-data examples.

View on arXiv
Comments on this paper