In this paper, we study the information-theoretic limits of learning the structure of Bayesian networks, on discrete as well as continuous random variables, from a finite amount of data. We show that under certain parameterizations of the Bayesian network --- the minimum number of samples required to learn the "true" network grows as and for non-sparse and sparse Bayesian networks respectively --- where is the number of variables in the network and is the maximum number of parents of any node in the network. We study various commonly used Bayesian networks, such as Conditional Probability Table (CPT) based networks, Noisy-OR networks, Logistic regression (LR) networks, and Gaussian networks. We identify various important parameters of the conditional distributions that affect the complexity of learning such models, like the maximum inverse probability for CPT networks, the failure probability for Noisy-OR networks, the norm of weight vectors for LR networks and the signal and noise parameters for Gaussian networks. We also show that an existing procedure calle SparsityBoost, by Brenner and Sontag, for learning binary CPT networks is information-theoretically optimal in the number of variables.
View on arXiv