ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.03788
  4. Cited By
Generating diverse and natural text-to-speech samples using a quantized
  fine-grained VAE and auto-regressive prosody prior

Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior

6 February 2020
Guangzhi Sun
Yu Zhang
Ron J. Weiss
Yuan Cao
Heiga Zen
Andrew Rosenberg
Bhuvana Ramabhadran
Yonghui Wu
    DiffM
ArXivPDFHTML

Papers citing "Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior"

50 / 60 papers shown
Title
An Exhaustive Evaluation of TTS- and VC-based Data Augmentation for ASR
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
59
0
0
11 Mar 2025
$α$-TCVAE: On the relationship between Disentanglement and
  Diversity
ααα-TCVAE: On the relationship between Disentanglement and Diversity
Cristian Meo
Louis Mahon
Anirudh Goyal
Justin Dauwels
DRL
61
8
0
01 Nov 2024
Investigating Disentanglement in a Phoneme-level Speech Codec for
  Prosody Modeling
Investigating Disentanglement in a Phoneme-level Speech Codec for Prosody Modeling
Sotirios Karapiperis
Nikolaos Ellinas
Alexandra Vioni
Junkwang Oh
Gunu Jho
Inchul Hwang
S. Raptis
33
0
0
13 Sep 2024
VQUNet: Vector Quantization U-Net for Defending Adversarial Atacks by
  Regularizing Unwanted Noise
VQUNet: Vector Quantization U-Net for Defending Adversarial Atacks by Regularizing Unwanted Noise
Zhixun He
Mukesh Singhal
28
1
0
05 Jun 2024
CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot
  Text-to-Speech
CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech
Jaehyeon Kim
Keon Lee
Seungjun Chung
Jaewoong Cho
70
39
0
03 Apr 2024
MM-TTS: Multi-modal Prompt based Style Transfer for Expressive
  Text-to-Speech Synthesis
MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis
Wenhao Guan
Yishuang Li
Tao Li
Hukai Huang
Feng Wang
Jiayan Lin
Lingyan Huang
Lin Li
Q. Hong
23
8
0
17 Dec 2023
Improving End-to-End Speech Processing by Efficient Text Data
  Utilization with Latent Synthesis
Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis
Jianqiao Lu
Wenyong Huang
Nianzu Zheng
Xingshan Zeng
Y. Yeung
Xiao Chen
SyDa
19
1
0
09 Oct 2023
Cross-Utterance Conditioned VAE for Speech Generation
Cross-Utterance Conditioned VAE for Speech Generation
Y. Li
Cheng Yu
Guangzhi Sun
Weiqin Zu
Zheng Tian
...
Wei Pan
Chao Zhang
Jun Wang
Yang Yang
Fanglei Sun
11
2
0
08 Sep 2023
Variational latent discrete representation for time series modelling
Variational latent discrete representation for time series modelling
Max H. Cohen
M. Charbit
Sylvain Le Corff
25
1
0
27 Jun 2023
Interpretable Style Transfer for Text-to-Speech with ControlVAE and
  Diffusion Bridge
Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge
Wenhao Guan
Tao Li
Yishuang Li
Hukai Huang
Q. Hong
Lin Li
DiffM
27
6
0
07 Jun 2023
PITS: Variational Pitch Inference without Fundamental Frequency for
  End-to-End Pitch-controllable TTS
PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS
Junhyeok Lee
Wonbin Jung
Hyunjae Cho
Jaeyeon Kim
Jaehwan Kim
17
3
0
24 Feb 2023
MAC: A unified framework boosting low resource automatic speech
  recognition
MAC: A unified framework boosting low resource automatic speech recognition
Zeping Min
Qian Ge
Zhong Li
E. Weinan
13
1
0
05 Feb 2023
Time out of Mind: Generating Rate of Speech conditioned on emotion and
  speaker
Time out of Mind: Generating Rate of Speech conditioned on emotion and speaker
Navjot Kaur
Paige Tuttosi
19
2
0
29 Jan 2023
Style-Label-Free: Cross-Speaker Style Transfer by Quantized VAE and
  Speaker-wise Normalization in Speech Synthesis
Style-Label-Free: Cross-Speaker Style Transfer by Quantized VAE and Speaker-wise Normalization in Speech Synthesis
Chunyu Qiang
Peng Yang
Hao Che
Xiaorui Wang
Zhongyuan Wang
BDL
21
6
0
13 Dec 2022
Controllable speech synthesis by learning discrete phoneme-level
  prosodic representations
Controllable speech synthesis by learning discrete phoneme-level prosodic representations
Nikolaos Ellinas
Myrsini Christidou
Alexandra Vioni
June Sig Sung
Aimilios Chalamandaris
Pirros Tsiakoulis
P. Mastorocostas
17
7
0
29 Nov 2022
Prosody-controllable spontaneous TTS with neural HMMs
Prosody-controllable spontaneous TTS with neural HMMs
Harm Lameris
Shivam Mehta
G. Henter
Joakim Gustafson
Éva Székely
33
15
0
24 Nov 2022
Predicting phoneme-level prosody latents using AR and flow-based Prior
  Networks for expressive speech synthesis
Predicting phoneme-level prosody latents using AR and flow-based Prior Networks for expressive speech synthesis
Konstantinos Klapsas
Karolos Nikitaras
Nikolaos Ellinas
June Sig Sung
Inchul Hwang
S. Raptis
Aimilios Chalamandaris
Pirros Tsiakoulis
19
0
0
02 Nov 2022
Learning utterance-level representations through token-level acoustic
  latents prediction for Expressive Speech Synthesis
Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis
Karolos Nikitaras
Konstantinos Klapsas
Nikolaos Ellinas
Georgia Maniati
June Sig Sung
Inchul Hwang
S. Raptis
Aimilios Chalamandaris
Pirros Tsiakoulis
14
0
0
01 Nov 2022
Speech Synthesis with Mixed Emotions
Speech Synthesis with Mixed Emotions
Kun Zhou
Berrak Sisman
R. Rana
B.W.Schuller
Haizhou Li
14
43
0
11 Aug 2022
Self-supervised Context-aware Style Representation for Expressive Speech
  Synthesis
Self-supervised Context-aware Style Representation for Expressive Speech Synthesis
Yihan Wu
Xi Wang
S. Zhang
Lei He
Ruihua Song
J. Nie
34
15
0
25 Jun 2022
GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain
  Text-to-Speech
GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech
Rongjie Huang
Yi Ren
Jinglin Liu
Chenye Cui
Zhou Zhao
OODD
VLM
115
34
0
15 May 2022
Read the Room: Adapting a Robot's Voice to Ambient and Social Contexts
Read the Room: Adapting a Robot's Voice to Ambient and Social Contexts
Paige Tuttosi
Emma Hughson
Akihiro Matsufuji
Angelica Lim
20
4
0
10 May 2022
Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech
Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech
Y. Li
Cheng Yu
Guangzhi Sun
Hua Jiang
Fanglei Sun
Weiqin Zu
Ying Wen
Yang Yang
Jun Wang
17
7
0
09 May 2022
Layer-wise Fast Adaptation for End-to-End Multi-Accent Speech
  Recognition
Layer-wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition
Xun Gong
Y. Qian
Houjun Huang
Yanmin Qian
26
44
0
21 Apr 2022
Hierarchical and Multi-Scale Variational Autoencoder for Diverse and
  Natural Non-Autoregressive Text-to-Speech
Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-Speech
Jaesung Bae
Jinhyeok Yang
Taejun Bak
Young-Sun Joo
DiffM
19
6
0
08 Apr 2022
Unsupervised word-level prosody tagging for controllable speech
  synthesis
Unsupervised word-level prosody tagging for controllable speech synthesis
Yiwei Guo
Chenpeng Du
Kai Yu
13
15
0
15 Feb 2022
Disentangling Style and Speaker Attributes for TTS Style Transfer
Disentangling Style and Speaker Attributes for TTS Style Transfer
Xiaochun An
Frank Soong
Lei Xie
54
18
0
24 Jan 2022
Prosodic Clustering for Phoneme-level Prosody Control in End-to-End
  Speech Synthesis
Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis
Alexandra Vioni
Myrsini Christidou
Nikolaos Ellinas
G. Vamvoukakis
Panos Kakoulidis
Taehoon Kim
June Sig Sung
Hyoungmin Park
Aimilios Chalamandaris
Pirros Tsiakoulis
11
11
0
19 Nov 2021
Word-Level Style Control for Expressive, Non-attentive Speech Synthesis
Word-Level Style Control for Expressive, Non-attentive Speech Synthesis
Konstantinos Klapsas
Nikolaos Ellinas
June Sig Sung
Hyoungmin Park
S. Raptis
22
9
0
19 Nov 2021
Improved Prosodic Clustering for Multispeaker and Speaker-independent
  Phoneme-level Prosody Control
Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control
Myrsini Christidou
Alexandra Vioni
Nikolaos Ellinas
G. Vamvoukakis
K. Markopoulos
Panos Kakoulidis
June Sig Sung
Hyoungmin Park
Aimilios Chalamandaris
Pirros Tsiakoulis
19
4
0
19 Nov 2021
DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard
  Challenge 2021
DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021
Yanqing Liu
Rui Shao
G. Wang
Kuan Chen
Bohan Li
P. Yuen
Jinzhu Li
Lei He
Sheng Zhao
32
55
0
25 Oct 2021
Discrete Acoustic Space for an Efficient Sampling in Neural
  Text-To-Speech
Discrete Acoustic Space for an Efficient Sampling in Neural Text-To-Speech
Mu-Wei Li
Jonas Rohnke
A. Bonafonte
Mateusz Lajszczak
Trevor Wood
DRL
17
2
0
24 Oct 2021
Improving Emotional Speech Synthesis by Using SUS-Constrained VAE and
  Text Encoder Aggregation
Improving Emotional Speech Synthesis by Using SUS-Constrained VAE and Text Encoder Aggregation
Fengyu Yang
Jian Luan
Yujun Wang
30
5
0
19 Oct 2021
Style Equalization: Unsupervised Learning of Controllable Generative
  Sequence Models
Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models
Jen-Hao Rick Chang
A. Shrivastava
H. Koppula
Xiaoshuai Zhang
Oncel Tuzel
DiffM
51
16
0
06 Oct 2021
Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice
  Cloning
Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning
Rui Li
dong Pu
Minnie Huang
Bill Huang
50
14
0
23 Sep 2021
Multi-Scale Spectrogram Modelling for Neural Text-to-Speech
Multi-Scale Spectrogram Modelling for Neural Text-to-Speech
Ammar Abbas
Bajibabu Bollepalli
Alexis Moinet
Arnaud Joly
Penny Karanasou
Peter Makarov
Simon Slangens
S. Karlapati
Thomas Drugman
16
0
0
29 Jun 2021
A Survey on Neural Speech Synthesis
A Survey on Neural Speech Synthesis
Xu Tan
Tao Qin
Frank Soong
Tie-Yan Liu
AI4TS
18
352
0
29 Jun 2021
FastPitchFormant: Source-filter based Decomposed Modeling for Speech
  Synthesis
FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis
Taejun Bak
Jaesung Bae
Hanbin Bae
Young-Ik Kim
Hoon-Young Cho
20
16
0
29 Jun 2021
Preliminary study on using vector quantization latent spaces for TTS/VC
  systems with consistent performance
Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance
Hieu-Thi Luong
Junichi Yamagishi
17
0
0
25 Jun 2021
Improving Performance of Seen and Unseen Speech Style Transfer in
  End-to-end Neural TTS
Improving Performance of Seen and Unseen Speech Style Transfer in End-to-end Neural TTS
Xiaochun An
Frank Soong
Lei Xie
34
9
0
18 Jun 2021
Phone-Level Prosody Modelling with GMM-Based MDN for Diverse and
  Controllable Speech Synthesis
Phone-Level Prosody Modelling with GMM-Based MDN for Diverse and Controllable Speech Synthesis
Chenpeng Du
K. Yu
17
18
0
27 May 2021
EAT: Enhanced ASR-TTS for Self-supervised Speech Recognition
EAT: Enhanced ASR-TTS for Self-supervised Speech Recognition
M. Baskar
L. Burget
Shinji Watanabe
Ramón Fernández Astudillo
J. Černocký
12
26
0
13 Apr 2021
AdaSpeech: Adaptive Text to Speech for Custom Voice
AdaSpeech: Adaptive Text to Speech for Custom Voice
Mingjian Chen
Xu Tan
Bohan Li
Yanqing Liu
Tao Qin
Sheng Zhao
Tie-Yan Liu
VLM
DiffM
23
187
0
01 Mar 2021
AISPEECH-SJTU accent identification system for the Accented English
  Speech Recognition Challenge
AISPEECH-SJTU accent identification system for the Accented English Speech Recognition Challenge
Houjun Huang
Xu Xiang
Yexin Yang
Rao Ma
Y. Qian
11
25
0
19 Feb 2021
Rich Prosody Diversity Modelling with Phone-level Mixture Density
  Network
Rich Prosody Diversity Modelling with Phone-level Mixture Density Network
Chenpeng Du
K. Yu
28
17
0
01 Feb 2021
Few Shot Adaptive Normalization Driven Multi-Speaker Speech Synthesis
Few Shot Adaptive Normalization Driven Multi-Speaker Speech Synthesis
Neeraj Kumar
Srishti Goel
Ankur Narang
Brejesh Lall
19
5
0
14 Dec 2020
Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis
Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis
C. Chien
Hung-yi Lee
19
36
0
12 Nov 2020
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Ron J. Weiss
RJ Skerry-Ryan
Eric Battenberg
Soroosh Mariooryad
Diederik P. Kingma
19
97
0
06 Nov 2020
Improving Prosody Modelling with Cross-Utterance BERT Embeddings for
  End-to-end Speech Synthesis
Improving Prosody Modelling with Cross-Utterance BERT Embeddings for End-to-end Speech Synthesis
Guanghui Xu
Wei Song
Zhengchen Zhang
Chao Zhang
Xiaodong He
Bowen Zhou
11
50
0
06 Nov 2020
AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines
AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines
Yao Shi
Hui Bu
Xin Xu
Shaojing Zhang
Ming Li
16
218
0
22 Oct 2020
12
Next