Mitigating Byzantine Attacks in Federated Learning
- FedML
For mitigating Byzantine behaviors in federated learning (FL), most state-of-the-art approaches, such as Bulyan, tend to leverage the similarity of updates from the benign clients. However, in many practical FL scenarios, data is non-IID across clients, thus the updates received from even the benign clients are quite dissimilar, resulting in poor convergence performance of such similarity based methods. As our main contribution, we propose \textit{DiverseFL} to overcome this challenge in heterogeneous data distribution settings. Particularly, the FL server in DiverseFL computes a \textit{guiding} gradient in every iteration for each client over a small sample of the client's local data that is received only once before start of the training. The server then utilizes a novel \textit{per client} criteria for flagging Byzantine updates, by comparing the corresponding guiding gradient with the client's update, and updates the model using the gradients received from the non-flagged clients. This overcomes the shortcoming of similarity based approaches since the flagging of a client is based on whether its update matches what is expected from its verified sample data (not its similarity to performance of others). As we demonstrate through our experiments involving neural networks, benchmark datasets and popular Byzantine attacks, including a strong backdoor attack for non-IID data, DiverseFL not only performs Byzantine mitigation quite effectively, it \textit{almost matches the performance of \textit{Oracle SGD}}, where the server knows the identities of the Byzantine clients.
View on arXiv