Edge exchangeable models for network data

Exchangeable models for vertex labeled graphs cannot replicate the large sample behaviors of sparsity and power law degree distributions observed in many network datasets. Out of this mathematical impossibility emerges the question of how network data can be modeled in a way that reflects known empirical behaviors and respects basic statistical principles. We address this question by observing that edges, not vertices, act as the statistical units in many network datasets, making a theory of edge labeled networks more natural for these applications. In this context we introduce the new invariance principle of {\em edge exchangeability}, which unlike its vertex exchangeable counterpart admits models for networks with sparse and/or power law structure. With this, we settle a longstanding question in statistical network modeling. We characterize all edge exchangeable network models and establish their basic statistical properties. We also identify a tractable family of distributions with a clear interpretation and suitable theoretical properties, whose significance in estimation, prediction, and testing we demonstrate.
View on arXiv