17
0

ArVoice: A Multi-Speaker Dataset for Arabic Speech Synthesis

Main:4 Pages
2 Figures
Bibliography:1 Pages
5 Tables
Abstract

We introduce ArVoice, a multi-speaker Modern Standard Arabic (MSA) speech corpus with diacritized transcriptions, intended for multi-speaker speech synthesis, and can be useful for other tasks such as speech-based diacritic restoration, voice conversion, and deepfake detection. ArVoice comprises: (1) a new professionally recorded set from six voice talents with diverse demographics, (2) a modified subset of the Arabic Speech Corpus; and (3) high-quality synthetic speech from two commercial systems. The complete corpus consists of a total of 83.52 hours of speech across 11 voices; around 10 hours consist of human voices from 7 speakers. We train three open-source TTS and two voice conversion systems to illustrate the use cases of the dataset. The corpus is available for research use.

View on arXiv
@article{toyin2025_2505.20506,
  title={ ArVoice: A Multi-Speaker Dataset for Arabic Speech Synthesis },
  author={ Hawau Olamide Toyin and Rufael Marew and Humaid Alblooshi and Samar M. Magdy and Hanan Aldarmaki },
  journal={arXiv preprint arXiv:2505.20506},
  year={ 2025 }
}
Comments on this paper