India has 1369 languages of which 22 are official. About 13 different scripts are used to represent these languages. A Common Label Set (CLS) was developed based on phonetics to address the issue of large vocabulary of units required in the End-to-End (E2E) framework for multilingual synthesis. The Indian language text is first converted to CLS. This approach enables seamless code switching across 13 Indian languages and English in a given native speaker's voice, which corresponds to everyday speech in the Indian subcontinent, where the population is multilingual.
View on arXiv@article{p2025_2410.10508, title={ Everyday Speech in the Indian Subcontinent }, author={ Utkarsh P }, journal={arXiv preprint arXiv:2410.10508}, year={ 2025 } }