42
1

SpoofCeleb: Speech Deepfake Detection and SASV In The Wild

Abstract

This paper introduces SpoofCeleb, a dataset designed for Speech Deepfake Detection (SDD) and Spoofing-robust Automatic Speaker Verification (SASV), utilizing source data from real-world conditions and spoofing attacks generated by Text-To-Speech (TTS) systems also trained on the same real-world data. Robust recognition systems require speech data recorded in varied acoustic environments with different levels of noise to be trained. However, current datasets typically include clean, high-quality recordings (bona fide data) due to the requirements for TTS training; studio-quality or well-recorded read speech is typically necessary to train TTS models. Current SDD datasets also have limited usefulness for training SASV models due to insufficient speaker diversity. SpoofCeleb leverages a fully automated pipeline we developed that processes the VoxCeleb1 dataset, transforming it into a suitable form for TTS training. We subsequently train 23 contemporary TTS systems. SpoofCeleb comprises over 2.5 million utterances from 1,251 unique speakers, collected under natural, real-world conditions. The dataset includes carefully partitioned training, validation, and evaluation sets with well-controlled experimental protocols. We present the baseline results for both SDD and SASV tasks. All data, protocols, and baselines are publicly available atthis https URL.

View on arXiv
@article{jung2025_2409.17285,
  title={ SpoofCeleb: Speech Deepfake Detection and SASV In The Wild },
  author={ Jee-weon Jung and Yihan Wu and Xin Wang and Ji-Hoon Kim and Soumi Maiti and Yuta Matsunaga and Hye-jin Shim and Jinchuan Tian and Nicholas Evans and Joon Son Chung and Wangyou Zhang and Seyun Um and Shinnosuke Takamichi and Shinji Watanabe },
  journal={arXiv preprint arXiv:2409.17285},
  year={ 2025 }
}
Comments on this paper