We develop FedCluster--a novel federated learning framework with improved optimization efficiency, and investigate its theoretical convergence properties. The FedCluster groups the devices into multiple clusters that perform federated learning cyclically in each learning round. Therefore, each learning round of FedCluster consists of multiple cycles of meta-update that boost the overall convergence. In nonconvex optimization, we show that FedCluster with the devices implementing the local {stochastic gradient descent (SGD)} algorithm achieves a faster convergence rate than the conventional {federated averaging (FedAvg)} algorithm in the presence of device-level data heterogeneity. We conduct experiments on deep learning applications and demonstrate that FedCluster converges significantly faster than the conventional federated learning under diverse levels of device-level data heterogeneity for a variety of local optimizers.
View on arXiv