ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.14301
11
3

An overview of text-to-speech systems and media applications

22 October 2023
Mohammad Reza Hasanabadi
ArXivPDFHTML
Abstract

Producing synthetic voice, similar to human-like sound, is an emerging novelty of modern interactive media systems. Text-To-Speech (TTS) systems try to generate synthetic and authentic voices via text input. Besides, well known and familiar dubbing, announcing and narrating voices, as valuable possessions of any media organization, can be kept forever by utilizing TTS and Voice Conversion (VC) algorithms . The emergence of deep learning approaches has made such TTS systems more accurate and accessible. To understand TTS systems better, this paper investigates the key components of such systems including text analysis, acoustic modelling and vocoding. The paper then provides details of important state-of-the-art TTS systems based on deep learning. Finally, a comparison is made between recently released systems in term of backbone architecture, type of input and conversion, vocoder used and subjective assessment (MOS). Accordingly, Tacotron 2, Transformer TTS, WaveNet and FastSpeech 1 are among the most successful TTS systems ever released. In the discussion section, some suggestions are made to develop a TTS system with regard to the intended application.

View on arXiv
Comments on this paper