196

The Statistical Performance of Collaborative Inference

Abstract

The statistical analysis of massive and complex data sets will require the development of algorithms that depend on distributed computing and collaborative inference. Inspired by this, we propose a collaborative framework that aims to estimate the unknown mean θ\theta of a random variable XX. In the model we present, a certain number of calculation units, distributed across a communication network represented by a graph, participate in the estimation of θ\theta by sequentially receiving independent data from XX while exchanging messages via a stochastic matrix AA defined over the graph. We give precise conditions on the matrix AA under which the statistical precision of the individual units is comparable to that of a (gold standard) virtual centralized estimate, even though each unit does not have access to all of the data. We show in particular the fundamental role played by both the non-trivial eigenvalues of AA and the Ramanujan class of expander graphs, which provide remarkable performance for moderate algorithmic cost.

View on arXiv
Comments on this paper