Collaborative Representation Learning

5 April 2016

Abstract

This paper investigates an information-theoretic approach to the problem of collaborative representation learning: how to extract salient features of statistical relationships in order to build cooperatively meaningful representations of some relevant content. Modeling the structure of data and its hidden representations by independently identically distributed samples, our goal is to study fundamental limits of the so-called Two-way Collaborative Representation Learning (TW-CRL) and the Collaborative Distributed Representation Learning (CDRL) problems. The TW-CRL problem consists of two distant encoders that separately observe marginal (dependent) components $X_1$ and $X_2$ and can cooperate through multiple exchanges of limited information with the aim of learning hidden representations $(Y_1,Y_2)$ , which can be arbitrarily dependent on $(X_1,X_2)$ . On the other hand, in CDRL there are two cooperating encoders and the learner of the hidden representation $Y$ is a third node which can listen the exchanges between the two encoders. The relevance (figure-of-merit) of such learned representations is measured in terms of a normalized (per-sample) multi-letter mutual information metric. Inner and outer bounds to the complexity-relevance region of these problems are derived from which optimality is characterized for several cases of interest. Our resulting complexity-relevance regions are finally evaluated for binary symmetric and Gaussian statistical models showing how to identify comparatively random features that represent complexity-constrained statistics for the inference of the hidden representations.

View on arXiv

Comments on this paper