245
v1v2v3v4v5v6v7 (latest)

Distributed Information-Theoretic Clustering

Abstract

We study a novel multi-terminal source coding setup motivated by the biclustering problem. Two separate encoders observe two i.i.d. sequences XnX^n and YnY^n, respectively. The goal is to find rate-limited encodings f(xn)f(x^n) and g(zn)g(z^n) that maximize the mutual information I(f(Xn);g(Yn))/nI(f(X^n); g(Y^n))/n. We discuss connections of this problem with hypothesis testing against independence, pattern recognition, and the information bottleneck method. Improving previous cardinality bounds for the inner and outer bounds allows us to thoroughly study the special case of a binary symmetric source and to quantify the gap between the inner and the outer bound in this special case. Furthermore, we investigate a multiple description (MD) extension of the Chief Operating Officer (CEO) problem with mutual information constraint. Surprisingly, this MD-CEO problem permits a tight single-letter characterization of the achievable region.

View on arXiv
Comments on this paper