Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

20 January 2023

Yinghao Aaron Li

Cong Han

Xilin Jiang

N. Mesgarani

ArXiv (abs)PDF HTML

Papers citing "Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions"

13 / 13 papers shown

SpeechWeave: Diverse Multilingual Synthetic Text & Audio Data Generation Pipeline for Training Text to Speech Models

Karan Dua

Puneet Mittal

Ranjeet Gupta

Hitesh Laxmichand Patel

DiffM

290

15 Sep 2025

Voice Cloning: Comprehensive Survey

Hussam Azzuni

Abdulmotaleb El Saddik

VLM

351

01 May 2025

Empowering Global Voices: A Data-Efficient, Phoneme-Tone Adaptive Approach to High-Fidelity Speech Synthesis

223

10 Apr 2025

Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie DubbingComputer Vision and Pattern Recognition (CVPR), 2025

339

15 Mar 2025

From Babble to Words: Pre-Training Language Models on Continuous Streams of Phonemes

Zébulon Goriely

Richard Diehl Martinez

288

30 Oct 2024

Word-wise intonation model for cross-language TTS systems

Tomilov A. A.

Gromova A. Y.

Svischev A. N

117

30 Sep 2024

StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style DiffusionNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

277

16 Sep 2024

PRESENT: Zero-Shot Text-to-Prosody ControlIEEE Signal Processing Letters (SPL), 2024

Nancy F. Chen

184

13 Aug 2024

Llama-VITS: Enhancing TTS Synthesis with Semantic AwarenessInternational Conference on Language Resources and Evaluation (LREC), 2024

Xincan Feng

A. Yoshimoto

264

10 Apr 2024

ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations

Xin Wang

Longbiao Wang

262

22 Dec 2023

Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWPInterspeech (Interspeech), 2023

Zhiba Su

230

11 Sep 2023

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language ModelsNeural Information Processing Systems (NeurIPS), 2023

Cong Han

306

212

13 Jun 2023

XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-SpeechInterspeech (Interspeech), 2023

L. T. Nguyen

Thinh-Le-Gia Pham

Dat Quoc Nguyen

250

31 May 2023