The question of model goodness of fit, a first step in data analysis, is easy to state, but often difficult to implement in practice, particularly for large and sparse or small-sample but structured data. We focus on this fundamental problem for relational data, which can be represented in form of a network: given one observed network, does the proposed model fit the data? Specifically, we construct finite-sample tests for three different variants of the stochastic blockmodel (SBM). The main building blocks are the known block assignment versions, and we propose extensions to the latent block case. We describe the Markov bases and the marginal polytope of these models. The methodology extends to any mixture of log-linear models on discrete data, and as such is the first application of algebraic statistics sampling for latent-variable models.
View on arXiv