Robustness to label noise within data is a significant challenge in federated learning (FL). From the data-centric perspective, the data quality of distributed datasets can not be guaranteed since annotations of different clients contain complicated label noise of varying degrees, which causes the performance degradation. There have been some early attempts to tackle noisy labels in FL. However, there exists a lack of benchmark studies on comprehensively evaluating their practical performance under unified settings. To this end, we propose the first benchmark study FNBench to provide an experimental investigation which considers three diverse label noise patterns covering synthetic label noise, imperfect human-annotation errors and systematic errors. Our evaluation incorporates eighteen state-of-the-art methods over five image recognition datasets and one text classification dataset. Meanwhile, we provide observations to understand why noisy labels impair FL, and additionally exploit a representation-aware regularization method to enhance the robustness of existing methods against noisy labels based on our observations. Finally, we discuss the limitations of this work and propose three-fold future directions. To facilitate related communities, our source code is open-sourced atthis https URL.
View on arXiv@article{jiang2025_2505.06684, title={ FNBench: Benchmarking Robust Federated Learning against Noisy Labels }, author={ Xuefeng Jiang and Jia Li and Nannan Wu and Zhiyuan Wu and Xujing Li and Sheng Sun and Gang Xu and Yuwei Wang and Qi Li and Min Liu }, journal={arXiv preprint arXiv:2505.06684}, year={ 2025 } }