ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.05236
52
2

Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance

7 February 2025
Shehzeen Samarah Hussain
Paarth Neekhara
Xuesong Yang
Edresson Casanova
Subhankar Ghosh
Mikyas T. Desta
Roy Fejgin
Rafael Valle
Jason Chun Lok Li
ArXivPDFHTML
Abstract

While autoregressive speech token generation models produce speech with remarkable variety and naturalness, their inherent lack of controllability often results in issues such as hallucinations and undesired vocalizations that do not conform to conditioning inputs. We introduce Koel-TTS, a suite of enhanced encoder-decoder Transformer TTS models that address these challenges by incorporating preference alignment techniques guided by automatic speech recognition and speaker verification models. Additionally, we incorporate classifier-free guidance to further improve synthesis adherence to the transcript and reference speaker audio. Our experiments demonstrate that these optimizations significantly enhance target speaker similarity, intelligibility, and naturalness of synthesized speech. Notably, Koel-TTS directly maps text and context audio to acoustic tokens, and on the aforementioned metrics, outperforms state-of-the-art TTS models, despite being trained on a significantly smaller dataset. Audio samples and demos are available on our website.

View on arXiv
@article{hussain2025_2502.05236,
  title={ Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance },
  author={ Shehzeen Hussain and Paarth Neekhara and Xuesong Yang and Edresson Casanova and Subhankar Ghosh and Mikyas T. Desta and Roy Fejgin and Rafael Valle and Jason Li },
  journal={arXiv preprint arXiv:2502.05236},
  year={ 2025 }
}
Comments on this paper