Knowledge Distillation for Federated Learning: a Practical Guide

9 November 2022

Alessio Mora

Irene Tenison

Paolo Bellavista

Irina Rish

FedML

ArXiv PDF HTML

Abstract

Federated Learning (FL) enables the training of Deep Learning models without centrally collecting possibly sensitive raw data. The most used algorithms for FL are parameter-averaging based schemes (e.g., Federated Averaging) that, however, have well known limits, i.e., model homogeneity, high communication cost, poor performance in presence of heterogeneous data distributions. Federated adaptations of regular Knowledge Distillation (KD) can solve or mitigate the weaknesses of parameter-averaging FL algorithms while possibly introducing other trade-offs. In this article, we originally present a focused review of the state-of-the-art KD-based algorithms specifically tailored for FL, by providing both a novel classification of the existing approaches and a detailed technical description of their pros, cons, and tradeoffs.

View on arXiv

@article{mora2025_2211.04742,
  title={ Knowledge Distillation for Federated Learning: a Practical Guide },
  author={ Alessio Mora and Irene Tenison and Paolo Bellavista and Irina Rish },
  journal={arXiv preprint arXiv:2211.04742},
  year={ 2025 }
}

Comments on this paper