ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.08810
  4. Cited By
Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme
  Predictions

Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
20 January 2023
Yinghao Aaron Li
Cong Han
Xilin Jiang
N. Mesgarani
ArXiv (abs)PDFHTML

Papers citing "Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions"

13 / 13 papers shown
SpeechWeave: Diverse Multilingual Synthetic Text & Audio Data Generation Pipeline for Training Text to Speech Models
SpeechWeave: Diverse Multilingual Synthetic Text & Audio Data Generation Pipeline for Training Text to Speech Models
Karan Dua
Puneet Mittal
Ranjeet Gupta
Hitesh Laxmichand Patel
DiffM
290
4
0
15 Sep 2025
Voice Cloning: Comprehensive Survey
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
351
3
0
01 May 2025
Empowering Global Voices: A Data-Efficient, Phoneme-Tone Adaptive Approach to High-Fidelity Speech Synthesis
Empowering Global Voices: A Data-Efficient, Phoneme-Tone Adaptive Approach to High-Fidelity Speech Synthesis
Yizhong Geng
Jizhuo Xu
Zeyu Liang
Jinghan Yang
Xiaoyi Shi
Xiaoyu Shen
223
0
0
10 Apr 2025
Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing
Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie DubbingComputer Vision and Pattern Recognition (CVPR), 2025
Zhedong Zhang
Liang-Sheng Li
C. Yan
Chunshan Liu
Anton Van Den Hengel
Yuankai Qi
339
5
0
15 Mar 2025
From Babble to Words: Pre-Training Language Models on Continuous Streams
  of Phonemes
From Babble to Words: Pre-Training Language Models on Continuous Streams of Phonemes
Zébulon Goriely
Richard Diehl Martinez
Andrew Caines
Lisa Beinborn
P. Buttery
CLL
288
8
0
30 Oct 2024
Word-wise intonation model for cross-language TTS systems
Word-wise intonation model for cross-language TTS systems
Tomilov A. A.
Gromova A. Y.
Svischev A. N
117
0
0
30 Sep 2024
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis
  with Distilled Time-Varying Style Diffusion
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style DiffusionNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Yinghao Aaron Li
Xilin Jiang
Cong Han
N. Mesgarani
DiffM
277
10
0
16 Sep 2024
PRESENT: Zero-Shot Text-to-Prosody Control
PRESENT: Zero-Shot Text-to-Prosody ControlIEEE Signal Processing Letters (SPL), 2024
Perry Lam
Huayun Zhang
Nancy F. Chen
Berrak Sisman
Dorien Herremans
184
2
0
13 Aug 2024
Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness
Llama-VITS: Enhancing TTS Synthesis with Semantic AwarenessInternational Conference on Language Resources and Evaluation (LREC), 2024
Xincan Feng
A. Yoshimoto
264
4
0
10 Apr 2024
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis
  Conditioned on Self-supervised Discrete Speech Representations
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
Cheng Gong
Xin Wang
Erica Cooper
Dan Wells
Longbiao Wang
Jianwu Dang
Korin Richmond
Junichi Yamagishi
262
36
0
22 Dec 2023
Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of
  SSWP
Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWPInterspeech (Interspeech), 2023
Jinzuomu Zhong
Yang Li
Hui Huang
Korin Richmond
J. Tang
Zhiba Su
Jing Guo
Benlai Tang
Fengjie Zhu
230
0
0
11 Sep 2023
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion
  and Adversarial Training with Large Speech Language Models
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language ModelsNeural Information Processing Systems (NeurIPS), 2023
Yinghao Aaron Li
Cong Han
Vinay S. Raghavan
Gavin Mischler
N. Mesgarani
VLMDiffM
306
212
0
13 Jun 2023
XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations
  for Text-to-Speech
XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-SpeechInterspeech (Interspeech), 2023
L. T. Nguyen
Thinh-Le-Gia Pham
Dat Quoc Nguyen
250
23
0
31 May 2023
1