18
0

Selective Masking Adversarial Attack on Automatic Speech Recognition Systems

Abstract

Extensive research has shown that Automatic Speech Recognition (ASR) systems are vulnerable to audio adversarial attacks. Current attacks mainly focus on single-source scenarios, ignoring dual-source scenarios where two people are speaking simultaneously. To bridge the gap, we propose a Selective Masking Adversarial attack, namely SMA attack, which ensures that one audio source is selected for recognition while the other audio source is muted in dual-source scenarios. To better adapt to the dual-source scenario, our SMA attack constructs the normal dual-source audio from the muted audio and selected audio. SMA attack initializes the adversarial perturbation with a small Gaussian noise and iteratively optimizes it using a selective masking optimization algorithm. Extensive experiments demonstrate that the SMA attack can generate effective and imperceptible audio adversarial examples in the dual-source scenario, achieving an average success rate of attack of 100% and signal-to-noise ratio of 37.15dB on Conformer-CTC, outperforming the baselines.

View on arXiv
@article{fang2025_2504.04394,
  title={ Selective Masking Adversarial Attack on Automatic Speech Recognition Systems },
  author={ Zheng Fang and Shenyi Zhang and Tao Wang and Bowen Li and Lingchen Zhao and Zhangyi Wang },
  journal={arXiv preprint arXiv:2504.04394},
  year={ 2025 }
}
Comments on this paper