In this paper, we study the information theoretic limits of learning the structure of Bayesian networks from data. We show that for Bayesian networks on continuous as well as discrete random variables, there exists a parameterization of the Bayesian network such that, the minimum number of samples required to learn the "true" Bayesian network grows as , where is the number of variables in the network. Further, for sparse Bayesian networks, where the number of parents of any variable in the network is restricted to be at most for , the minimum number of samples required grows as . We discuss conditions under which these limits are achieved. For Bayesian networks over continuous variables, we obtain results for Gaussian regression and Gumbel Bayesian networks. While for the discrete variables, we obtain results for Noisy-OR, Conditional Probability Table (CPT) based Bayesian networks and Logistic regression networks. Finally, as a byproduct, we also obtain lower bounds on the sample complexity of feature selection in logistic regression and show that the bounds are sharp.
View on arXiv