LW-FedSSL: Resource-efficient Layer-wise Federated Self-supervised Learning

22 January 2024

Abstract

Many studies integrate federated learning (FL) with self-supervised learning (SSL) to take advantage of raw data distributed across edge devices. However, edge devices often struggle with high computational and communication costs imposed by SSL and FL algorithms. With the deployment of more complex and large-scale models, such as Transformers, these challenges are exacerbated. To tackle this, we propose the Layer-Wise Federated Self-Supervised Learning (LW-FedSSL) approach, which allows edge devices to incrementally train a small part of the model at a time. Specifically, in LW-FedSSL, training is decomposed into multiple stages, with each stage responsible for only a specific layer (or a block of layers) of the model. Since only a portion of the model is active for training at any given time, LW-FedSSL significantly reduces computational requirements. Additionally, only the active model portion needs to be exchanged between the FL server and clients, reducing the communication overhead. This enables LW-FedSSL to jointly address both computational and communication challenges in FL. Depending on the SSL algorithm used, it can achieve up to a $3.34 \times$ reduction in memory usage, $4.20 \times$ fewer computational operations (GFLOPs), and a $5.07 \times$ lower communication cost while maintaining performance comparable to its end-to-end training counterpart. Furthermore, we explore a progressive training strategy called Prog-FedSSL, which offers a $1.84\times$ reduction in GFLOPs and a $1.67\times$ reduction in communication costs while maintaining the same memory requirements as end-to-end training. While the resource efficiency of Prog-FedSSL is lower than that of LW-FedSSL, its performance improvements make it a viable candidate for FL environments with more lenient resource constraints.

View on arXiv

@article{tun2025_2401.11647,
  title={ LW-FedSSL: Resource-efficient Layer-wise Federated Self-supervised Learning },
  author={ Ye Lin Tun and Chu Myaet Thwal and Huy Q. Le and Minh N. H. Nguyen and Choong Seon Hong },
  journal={arXiv preprint arXiv:2401.11647},
  year={ 2025 }
}

Comments on this paper