Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond

The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge expands upon the acclaimed SUPERB framework, emphasizing self-supervised models in multilingual speech recognition and language identification. The challenge comprises a research track focused on applying ML-SUPERB to specific multilingual subjects, a Challenge Track for model submissions, and a New Language Track where language resource researchers can contribute and evaluate their low-resource language data in the context of the latest progress in multilingual speech recognition. The challenge garnered 12 model submissions and 54 language corpora, resulting in a comprehensive benchmark encompassing 154 languages. The findings indicate that merely scaling models is not the definitive solution for multilingual speech tasks, and a variety of speech/voice types present significant challenges in multilingual speech processing.
View on arXiv@article{shi2025_2310.05513, title={ Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond }, author={ Jiatong Shi and William Chen and Dan Berrebbi and Hsiu-Hsuan Wang and Wei-Ping Huang and En-Pei Hu and Ho-Lam Chuang and Xuankai Chang and Yuxun Tang and Shang-Wen Li and Abdelrahman Mohamed and Hung-yi Lee and Shinji Watanabe }, journal={arXiv preprint arXiv:2310.05513}, year={ 2025 } }