259
v1v2 (latest)

More Communication Does Not Result in Smaller Generalization Error in Federated Learning

International Symposium on Information Theory (ISIT), 2023
Abstract

We study the generalization error of statistical learning models in a Federated Learning (FL) setting. Specifically, there are KK devices or clients, each holding an independent own dataset of size nn. Individual models, learned locally via Stochastic Gradient Descent, are aggregated (averaged) by a central server into a global model and then sent back to the devices. We consider multiple (say RNR \in \mathbb N^*) rounds of model aggregation and study the effect of RR on the generalization error of the final aggregated model. We establish an upper bound on the generalization error that accounts explicitly for the effect of RR (in addition to the number of participating devices KK and dataset size nn). It is observed that, for fixed (n,K)(n, K), the bound increases with RR, suggesting that the generalization of such learning algorithms is negatively affected by more frequent communication with the parameter server. Combined with the fact that the empirical risk, however, generally decreases for larger values of RR, this indicates that RR might be a parameter to optimize to reduce the population risk of FL algorithms. The results of this paper, which extend straightforwardly to the heterogeneous data setting, are also illustrated through numerical examples.

View on arXiv
Comments on this paper