26
4

Information Recovery from Pairwise Measurements

Abstract

This paper is concerned with jointly recovering nn node-variables {xi}1in\left\{ x_{i}\right\}_{1\leq i\leq n} from a collection of pairwise difference measurements. Imagine we acquire a few observations taking the form of xixjx_{i}-x_{j}; the observation pattern is represented by a measurement graph G\mathcal{G} with an edge set E\mathcal{E} such that xixjx_{i}-x_{j} is observed if and only if (i,j)E(i,j)\in\mathcal{E}. To account for noisy measurements in a general manner, we model the data acquisition process by a set of channels with given input/output transition measures. Employing information-theoretic tools applied to channel decoding problems, we develop a \emph{unified} framework to characterize the fundamental recovery criterion, which accommodates general graph structures, alphabet sizes, and channel transition measures. In particular, our results isolate a family of \emph{minimum} \emph{channel divergence measures} to characterize the degree of measurement corruption, which together with the size of the minimum cut of G\mathcal{G} dictates the feasibility of exact information recovery. For various homogeneous graphs, the recovery condition depends almost only on the edge sparsity of the measurement graph irrespective of other graphical metrics; alternatively, the minimum sample complexity required for these graphs scales like \[ \text{minimum sample complexity }\asymp\frac{n\log n}{\mathsf{Hel}_{1/2}^{\min}} \] for certain information metric Hel1/2min\mathsf{Hel}_{1/2}^{\min} defined in the main text, as long as the alphabet size is not super-polynomial in nn. We apply our general theory to three concrete applications, including the stochastic block model, the outlier model, and the haplotype assembly problem. Our theory leads to order-wise tight recovery conditions for all these scenarios.

View on arXiv
Comments on this paper