Adversarial attacks hamper the decision-making ability of neural networks by perturbing the input signal. The addition of calculated small distortion to images, for instance, can deceive a well-trained image classification network. In this work, we propose a novel attack technique called Sparse Adversarial and Interpretable Attack Framework (SAIF). Specifically, we design imperceptible attacks that contain low-magnitude perturbations at a small number of pixels and leverage these sparse attacks to reveal the vulnerability of classifiers. We use the Frank-Wolfe (conditional gradient) algorithm to simultaneously optimize the attack perturbations for bounded magnitude and sparsity with convergence. Empirical results show that SAIF computes highly imperceptible and interpretable adversarial examples, and outperforms state-of-the-art sparse attack methods on the ImageNet dataset.
View on arXiv@article{imtiaz2025_2212.07495, title={ SAIF: Sparse Adversarial and Imperceptible Attack Framework }, author={ Tooba Imtiaz and Morgan Kohler and Jared Miller and Zifeng Wang and Masih Eskander and Mario Sznaier and Octavia Camps and Jennifer Dy }, journal={arXiv preprint arXiv:2212.07495}, year={ 2025 } }