ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.12181
  4. Cited By
StyleSpeech: Self-supervised Style Enhancing with VQ-VAE-based
  Pre-training for Expressive Audiobook Speech Synthesis

StyleSpeech: Self-supervised Style Enhancing with VQ-VAE-based Pre-training for Expressive Audiobook Speech Synthesis

19 December 2023
Xueyuan Chen
Xi Wang
Shaofei Zhang
Lei He
Zhiyong Wu
Xixin Wu
Helen M. Meng
ArXiv (abs)PDFHTML

Papers citing "StyleSpeech: Self-supervised Style Enhancing with VQ-VAE-based Pre-training for Expressive Audiobook Speech Synthesis"

8 / 8 papers shown
Title
See the Speaker: Crafting High-Resolution Talking Faces from Speech with Prior Guidance and Region Refinement
See the Speaker: Crafting High-Resolution Talking Faces from Speech with Prior Guidance and Region RefinementIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025
Jinting Wang
Jun Wang
Hei Victor Cheng
Li Liu
DiffM
68
0
0
28 Oct 2025
DiffDSR: Dysarthric Speech Reconstruction Using Latent Diffusion Model
DiffDSR: Dysarthric Speech Reconstruction Using Latent Diffusion Model
Xueyuan Chen
Dongchao Yang
Wenxuan Wu
Minglin Wu
Jing Xu
Xixin Wu
Zhiyong Wu
Helen M. Meng
DiffM
175
1
0
31 May 2025
CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal
  Dysarthric Speech Reconstruction
CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction
Xueyuan Chen
Dongchao Yang
Dingdong Wang
Xixin Wu
Zhiyong Wu
Helen Meng
161
1
0
12 Jun 2024
Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis
  with Context-Aware Contrastive Language-Audio Pretraining
Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis with Context-Aware Contrastive Language-Audio Pretraining
Jinlong Xue
Yayue Deng
Yingming Gao
Ya Li
RALMVLM
273
14
0
06 Jun 2024
Style Mixture of Experts for Expressive Text-To-Speech Synthesis
Style Mixture of Experts for Expressive Text-To-Speech Synthesis
Ahad Jawaid
Shreeram Suresh Chandra
Junchen Lu
Berrak Sisman
MoE
210
6
0
05 Jun 2024
Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover
  Strategy
Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover Strategy
Wenxuan Wu
Xueyuan Chen
Xixin Wu
Haizhou Li
Helen M. Meng
163
6
0
24 Mar 2024
Exploiting Audio-Visual Features with Pretrained AV-HuBERT for
  Multi-Modal Dysarthric Speech Reconstruction
Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction
Xueyuan Chen
Yuejiao Wang
Xixin Wu
Disong Wang
Zhiyong Wu
Xunying Liu
Helen M. Meng
116
8
0
31 Jan 2024
Expressive paragraph text-to-speech synthesis with multi-step
  variational autoencoder
Expressive paragraph text-to-speech synthesis with multi-step variational autoencoderInterspeech (Interspeech), 2023
Xuyuan Li
Zengqiang Shang
Peiyang Shi
Hua Hua
Jian Liu
Pengyuan Zhang
229
0
0
25 Aug 2023
1