StyleSpeech: Self-supervised Style Enhancing with VQ-VAE-based
Pre-training for Expressive Audiobook Speech Synthesis

StyleSpeech: Self-supervised Style Enhancing with VQ-VAE-based Pre-training for Expressive Audiobook Speech Synthesis

19 December 2023

Zhiyong Wu

ArXiv (abs)PDF HTML

Papers citing "StyleSpeech: Self-supervised Style Enhancing with VQ-VAE-based Pre-training for Expressive Audiobook Speech Synthesis"

8 / 8 papers shown

Title
See the Speaker: Crafting High-Resolution Talking Faces from Speech with Prior Guidance and Region RefinementIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025 Jinting Wang Jun Wang Hei Victor Cheng Li Liu DiffM 68 0 0 28 Oct 2025
DiffDSR: Dysarthric Speech Reconstruction Using Latent Diffusion Model Xueyuan Chen Dongchao Yang Wenxuan Wu Minglin Wu Jing Xu Xixin Wu Zhiyong Wu Helen M. Meng DiffM 175 1 0 31 May 2025
CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction Xueyuan Chen Dongchao Yang Dingdong Wang Xixin Wu Zhiyong Wu Helen Meng 161 1 0 12 Jun 2024
Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis with Context-Aware Contrastive Language-Audio Pretraining Jinlong Xue Yayue Deng Yingming Gao Ya Li RALM VLM 273 14 0 06 Jun 2024
Style Mixture of Experts for Expressive Text-To-Speech Synthesis Ahad Jawaid Shreeram Suresh Chandra Junchen Lu Berrak Sisman MoE 210 6 0 05 Jun 2024
Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover Strategy Wenxuan Wu Xueyuan Chen Xixin Wu Haizhou Li Helen M. Meng 163 6 0 24 Mar 2024
Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction Xueyuan Chen Yuejiao Wang Xixin Wu Disong Wang Zhiyong Wu Xunying Liu Helen M. Meng 116 8 0 31 Jan 2024
Expressive paragraph text-to-speech synthesis with multi-step variational autoencoderInterspeech (Interspeech), 2023 Xuyuan Li Zengqiang Shang Peiyang Shi Hua Hua Jian Liu Pengyuan Zhang 229 0 0 25 Aug 2023