48
v1v2 (latest)

Scalable Maximum Entropy Population Synthesis via Persistent Contrastive Divergence

Mirko Degli Esposti
Main:18 Pages
6 Figures
Bibliography:2 Pages
14 Tables
Appendix:6 Pages
Abstract

Maximum entropy (MaxEnt) modelling provides a principled framework for generating synthetic populations from aggregate census data, without access to individual-level microdata. The bottleneck of exact-enumeration approaches is expectation computation by explicit summation over the full tuple space \cX\cX, which becomes infeasible for more than K20K \approx 20 categorical attributes; sampling-based alternatives exist but rely on Metropolis-type schemes that require proposal tuning and rejection steps. We propose \emph{GibbsPCDSolver}, a stochastic replacement for this computation based on Persistent Contrastive Divergence (PCD): a persistent pool of NN synthetic individuals is updated by Gibbs sweeps at each gradient step, providing a stochastic approximation of the model expectations without ever materialising \cX\cX. We validate the approach on controlled benchmarks and on \emph{Syn-ISTAT}, a K=15K{=}15 Italian demographic benchmark with analytically exact marginal targets derived from ISTAT-inspired conditional probability tables. Scaling experiments across K{12,20,30,40,50}K \in \{12, 20, 30, 40, 50\} confirm that GibbsPCDSolver maintains \MRE[0.010,0.018]\MRE \in [0.010, 0.018] while \cX|\cX| grows eighteen orders of magnitude, with runtime scaling as O(K)O(K) rather than O(\cX)O(|\cX|). On Syn-ISTAT, GibbsPCDSolver reaches \MRE=0.03\MRE{=}0.03 on training constraints and -- crucially -- produces populations with effective sample size \Neff=N\Neff = N versus \Neff0.012N\Neff \approx 0.012\,N for generalised raking, an 86.8×86.8{\times} diversity advantage that is essential for agent-based urban simulations.

View on arXiv
Comments on this paper