260

On the Difficulty of Selecting Ising Models with Approximate Recovery

Abstract

In this paper, we consider the problem of estimating the underlying graphical model of an Ising distribution given a number of independent and identically distributed samples. We adopt an \emph{approximate recovery} criterion that allows for a number of missed edges or incorrectly-included edges, thus departing from the extensive literature considering the exact recovery problem. Our main results provide information-theoretic lower bounds on the required number of samples (i.e., the sample complexity) for graph classes imposing constraints on the number of edges, maximal degree, and sparse separation properties. We identify a broad range of scenarios where, either up to constant factors or logarithmic factors, our lower bounds match the best known lower bounds for the exact recovery criterion, several of which are known to be tight or near-tight. Hence, in these cases, we prove that the approximate recovery problem is not much easier than the exact recovery problem. Our bounds are obtained via a modification of Fano's inequality for handling the approximate recovery criterion, along with suitably-designed ensembles of graphs that can broadly be classed into two categories: (i) Those containing graphs that contain several isolated edges or cliques and are thus difficult to distinguish from the empty graph; (ii) Those containing graphs for which certain groups of nodes are highly correlated, thus making it difficult to determine precisely which edges connect them. We support our theoretical results on these ensembles with numerical experiments.

View on arXiv
Comments on this paper