Approaching the Harm of Gradient Attacks While Only Flipping Labels

28 February 2025

Abstract

Availability attacks are one of the strongest forms of training-phase attacks in machine learning, making the model unusable. While prior work in distributed ML has demonstrated such effect via gradient attacks and, more recently, data poisoning, we ask: can similar damage be inflicted solely by flipping training labels, without altering features? In this work, we introduce a novel formalization of label flipping attacks and derive an attacker-optimized loss function that better illustrates label flipping capabilities. To compare the damaging effect of label flipping with that of gradient attacks, we use a setting that allows us to compare their \emph{writing power} on the ML model. Our contribution is threefold, (1) we provide the first evidence for an availability attack through label flipping alone, (2) we shed light on an interesting interplay between what the attacker gains from more \emph{write access} versus what they gain from more \emph{flipping budget} and (3) we compare the power of targeted label flipping attack to that of an untargeted label flipping attack.

View on arXiv

@article{el-kabid2025_2503.00140,
  title={ Approaching the Harm of Gradient Attacks While Only Flipping Labels },
  author={ Abdessamad El-Kabid and El-Mahdi El-Mhamdi },
  journal={arXiv preprint arXiv:2503.00140},
  year={ 2025 }
}

Comments on this paper