ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.03788
  4. Cited By
Generating diverse and natural text-to-speech samples using a quantized
  fine-grained VAE and auto-regressive prosody prior

Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior

6 February 2020
Guangzhi Sun
Yu Zhang
Ron J. Weiss
Yuan Cao
Heiga Zen
Andrew Rosenberg
Bhuvana Ramabhadran
Yonghui Wu
    DiffM
ArXivPDFHTML

Papers citing "Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior"

10 / 60 papers shown
Title
End-to-End Text-to-Speech using Latent Duration based on VQ-VAE
End-to-End Text-to-Speech using Latent Duration based on VQ-VAE
Yusuke Yasuda
Xin Wang
Junichi Yamagishi
13
16
0
19 Oct 2020
Hierarchical Multi-Grained Generative Model for Expressive Speech
  Synthesis
Hierarchical Multi-Grained Generative Model for Expressive Speech Synthesis
Yukiya Hono
Kazuna Tsuboi
Kei Sawada
Kei Hashimoto
Keiichiro Oura
Yoshihiko Nankaku
K. Tokuda
BDL
11
24
0
17 Sep 2020
Prosody Learning Mechanism for Speech Synthesis System Without Text
  Length Limit
Prosody Learning Mechanism for Speech Synthesis System Without Text Length Limit
Zhen Zeng
Jianzong Wang
Ning Cheng
Jing Xiao
11
8
0
13 Aug 2020
Modeling Prosodic Phrasing with Multi-Task Learning in Tacotron-based
  TTS
Modeling Prosodic Phrasing with Multi-Task Learning in Tacotron-based TTS
Rui Liu
Berrak Sisman
F. Bao
Guanglai Gao
Haizhou Li
9
17
0
11 Aug 2020
Expressive TTS Training with Frame and Style Reconstruction Loss
Expressive TTS Training with Frame and Style Reconstruction Loss
Rui Liu
Berrak Sisman
Guanglai Gao
Haizhou Li
24
73
0
04 Aug 2020
Exploiting Deep Sentential Context for Expressive End-to-End Speech
  Synthesis
Exploiting Deep Sentential Context for Expressive End-to-End Speech Synthesis
Fengyu Yang
Shan Yang
Qinghua Wu
Yujun Wang
Lei Xie
11
5
0
03 Aug 2020
Pitchtron: Towards audiobook generation from ordinary people's voices
Pitchtron: Towards audiobook generation from ordinary people's voices
Sunghee Jung
Hoi-Rim Kim
11
5
0
21 May 2020
Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based
  Variable-Length Embedding
Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding
Seungwoo Choi
Seungju Han
Dongyoung Kim
S. Ha
24
65
0
18 May 2020
Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis
  Using Discrete Speech Representation
Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis Using Discrete Speech Representation
Tao Tu
Yuan-Jui Chen
Alexander H. Liu
Hung-yi Lee
25
7
0
16 May 2020
You Do Not Need More Data: Improving End-To-End Speech Recognition by
  Text-To-Speech Data Augmentation
You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
A. Laptev
Roman Korostik
A. Svischev
A. Andrusenko
Ivan Medennikov
S. Rybin
16
61
0
14 May 2020
Previous
12