ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.07062
19
1

Controllable Emphasis with zero data for text-to-speech

13 July 2023
Arnaud Joly
M. Nicolis
Ekaterina Peterova
Alessandro Lombardi
Ammar Abbas
Arent van Korlaar
A. Hussain
Parul Sharma
Alexis Moinet
Mateusz Lajszczak
Penny Karanasou
A. Bonafonte
Thomas Drugman
Elena Sokolova
ArXivPDFHTML
Abstract

We present a scalable method to produce high quality emphasis for text-to-speech (TTS) that does not require recordings or annotations. Many TTS models include a phoneme duration model. A simple but effective method to achieve emphasized speech consists in increasing the predicted duration of the emphasised word. We show that this is significantly better than spectrogram modification techniques improving naturalness by 7.3%7.3\%7.3% and correct testers' identification of the emphasized word in a sentence by 40%40\%40% on a reference female en-US voice. We show that this technique significantly closes the gap to methods that require explicit recordings. The method proved to be scalable and preferred in all four languages tested (English, Spanish, Italian, German), for different voices and multiple speaking styles.

View on arXiv
Comments on this paper