The Statistical Performance of Collaborative Inference
- FedML
The statistical analysis of massive and complex data sets will require the development of algorithms that depend on distributed computing and collaborative inference. Inspired by this, we propose a collaborative framework that aims to estimate the unknown mean of a random variable . In the model we present, a certain number of calculation units, distributed across a communication network represented by a graph, participate in the estimation of by sequentially receiving independent data from while exchanging messages via a stochastic matrix defined over the graph. We give precise conditions on the matrix under which the statistical precision of the individual units is comparable to that of a (gold standard) virtual centralized estimate, even though each unit does not have access to all of the data. We show in particular the fundamental role played by both the non-trivial eigenvalues of and the Ramanujan class of expander graphs, which provide remarkable performance for moderate algorithmic cost.
View on arXiv