Population Predictive Checks

2 August 2019

Abstract

Bayesian modeling has become a staple for researchers to articulate assumptions and develop methods tailored for specific data applications. Thanks to recent developments in approximate posterior inference, researchers can easily build, use, and revise complicated Bayesian models for large and rich data. These new abilities, however, bring into focus the problem of model criticism. Researchers need tools to diagnose the fitness of their models, to understand where they fall short, and to guide their revision. In this paper we develop a new method for Bayesian model criticism, the population predictive check (POP-PC). POP-PCs are built on posterior predictive checks (PPCs), a seminal method that checks a model by assessing the posterior predictive distribution on the observed data. However, PPC use the data twice -- both to calculate the posterior predictive and to evaluate it -- which can lead to overconfident assessments of the quality of a model. POP-PCs, in contrast, compare the posterior predictive distribution to a draw from the population distribution, which in practice is a heldout dataset. We prove this strategy, which blends Bayesian modeling with frequentist assessment, is calibrated, unlike the PPC. Moreover, we demonstrate that calibrating PPC p-values post-hoc does not resolve the "double use of the data" problem. Finally, we study POP-PCs on classical regression and a hierarchical model of text data.

View on arXiv

Comments on this paper