28
0

Voice Adaptation for Swiss German

Main:4 Pages
1 Figures
Bibliography:1 Pages
8 Tables
Abstract

This work investigates the performance of Voice Adaptation models for Swiss German dialects, i.e., translating Standard German text to Swiss German dialect speech. For this, we preprocess a large dataset of Swiss podcasts, which we automatically transcribe and annotate with dialect classes, yielding approximately 5000 hours of weakly labeled training material. We fine-tune the XTTSv2 model on this dataset and show that it achieves good scores in human and automated evaluations and can correctly render the desired dialect. Our work shows a step towards adapting Voice Cloning technology to underrepresented languages. The resulting model achieves CMOS scores of up to -0.28 and SMOS scores of 3.8.

View on arXiv
@article{stucki2025_2505.22054,
  title={ Voice Adaptation for Swiss German },
  author={ Samuel Stucki and Jan Deriu and Mark Cieliebak },
  journal={arXiv preprint arXiv:2505.22054},
  year={ 2025 }
}
Comments on this paper