ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2412.01100
97
1

The Codec Language Model-based Zero-Shot Spontaneous Style TTS System for CoVoC Challenge 2024

2 December 2024
Shuoyi Zhou
Yixuan Zhou
Weiqing Li
Jun Chen
Runchuan Ye
Weihao Wu
Zijian Lin
Shun Lei
Zhiyong Wu
ArXivPDFHTML
Abstract

This paper describes the zero-shot spontaneous style TTS system for the ISCSLP 2024 Conversational Voice Clone Challenge (CoVoC). We propose a LLaMA-based codec language model with a delay pattern to achieve spontaneous style voice cloning. To improve speech intelligibility, we introduce the Classifier-Free Guidance (CFG) strategy in the language model to strengthen conditional guidance on token prediction. To generate high-quality utterances, we adopt effective data preprocessing operations and fine-tune our model with selected high-quality spontaneous speech data. The official evaluations in the CoVoC constrained track show that our system achieves the best speech naturalness MOS of 3.80 and obtains considerable speech quality and speaker similarity results.

View on arXiv
@article{zhou2025_2412.01100,
  title={ The Codec Language Model-based Zero-Shot Spontaneous Style TTS System for CoVoC Challenge 2024 },
  author={ Shuoyi Zhou and Yixuan Zhou and Weiqin Li and Jun Chen and Runchuan Ye and Weihao Wu and Zijian Lin and Shun Lei and Zhiyong Wu },
  journal={arXiv preprint arXiv:2412.01100},
  year={ 2025 }
}
Comments on this paper