Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1810.07217
Cited By
v1
v2 (latest)
Hierarchical Generative Modeling for Controllable Speech Synthesis
16 October 2018
Wei-Ning Hsu
Yu Zhang
Ron J. Weiss
Heiga Zen
Yonghui Wu
Yuxuan Wang
Yuan Cao
Ye Jia
Zhiwen Chen
Jonathan Shen
Patrick Nguyen
Ruoming Pang
BDL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Hierarchical Generative Modeling for Controllable Speech Synthesis"
50 / 178 papers shown
Title
A learned conditional prior for the VAE acoustic space of a TTS system
Panagiota Karanasou
S. Karlapati
Alexis Moinet
Arnaud Joly
Ammar Abbas
Simon Slangen
Jaime Lorenzo-Trueba
Thomas Drugman
74
7
0
14 Jun 2021
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Jaehyeon Kim
Jungil Kong
Juhee Son
DRL
167
902
0
11 Jun 2021
NWT: Towards natural audio-to-video generation with representation learning
Rayhane Mama
Marc S. Tyndel
Hashiam Kadhim
Cole Clifford
Ragavan Thurairatnam
VGen
112
12
0
08 Jun 2021
LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization
A. Lahiri
Vivek Kwatra
C. Frueh
J. P. Lewis
C. Bregler
3DH
86
102
0
08 Jun 2021
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
Dong Min
Dong Bok Lee
Eunho Yang
Sung Ju Hwang
134
175
0
06 Jun 2021
Learning Robust Latent Representations for Controllable Speech Synthesis
Shakti Kumar
Jithin Pradeep
Hussain Zaidi
DRL
68
6
0
10 May 2021
MASS: Multi-task Anthropomorphic Speech Synthesis Framework
Jinyin Chen
Linhui Ye
Zhaoyan Ming
65
7
0
10 May 2021
Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability
Rui Liu
Berrak Sisman
Haizhou Li
69
32
0
03 Apr 2021
STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech
Keon Lee
Kyumin Park
Daeyoung Kim
69
32
0
17 Mar 2021
Signal Representations for Synthesizing Audio Textures with Generative Adversarial Networks
Chitralekha Gupta
Purnima Kamath
L. Wyse
49
9
0
12 Mar 2021
Analysis and Assessment of Controllability of an Expressive Deep Learning-based TTS system
Noé Tits
Kevin El Haddad
Thierry Dutoit
69
5
0
06 Mar 2021
Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech
C. Chien
Jheng-hao Lin
Chien-yu Huang
Po-Chun Hsu
Hung-yi Lee
119
70
0
06 Mar 2021
Disentangled Sequence Clustering for Human Intention Inference
Mark Zolotas
Y. Demiris
DRL
82
5
0
23 Jan 2021
Hierarchical disentangled representation learning for singing voice conversion
Naoya Takahashi
M. Singh
Yuki Mitsufuji
DRL
60
14
0
18 Jan 2021
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units
Wei-Ning Hsu
David Harwath
Christopher Song
James R. Glass
CLIP
90
67
0
31 Dec 2020
DenoiSpeech: Denoising Text to Speech with Frame-Level Noise Modeling
Chen Zhang
Yi Ren
Xu Tan
Jinglin Liu
Ke-jun Zhang
Tao Qin
Sheng Zhao
Tie-Yan Liu
DiffM
97
38
0
17 Dec 2020
Measuring Disentanglement: A Review of Metrics
M. Carbonneau
Julian Zaïdi
Jonathan Boilard
G. Gagnon
CoGe
DRL
89
85
0
16 Dec 2020
Few Shot Adaptive Normalization Driven Multi-Speaker Speech Synthesis
Neeraj Kumar
Srishti Goel
Ankur Narang
Brejesh Lall
68
5
0
14 Dec 2020
Synth2Aug: Cross-domain speaker recognition with TTS synthesized speech
Yiling Huang
Yutian Chen
Jason W. Pelecanos
Quan Wang
98
12
0
24 Nov 2020
Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis
C. Chien
Hung-yi Lee
91
36
0
12 Nov 2020
Pretraining Strategies, Waveform Model Choice, and Acoustic Configurations for Multi-Speaker End-to-End Speech Synthesis
Erica Cooper
Xin Wang
Yi Zhao
Yusuke Yasuda
Junichi Yamagishi
SyDa
50
3
0
10 Nov 2020
Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement
Daxin Tan
Tan Lee
116
21
0
08 Nov 2020
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Ron J. Weiss
RJ Skerry-Ryan
Eric Battenberg
Soroosh Mariooryad
Diederik P. Kingma
99
101
0
06 Nov 2020
Improving Prosody Modelling with Cross-Utterance BERT Embeddings for End-to-end Speech Synthesis
Guanghui Xu
Wei Song
Zhengchen Zhang
Chao Zhang
Xiaodong He
Bowen Zhou
62
50
0
06 Nov 2020
Speech Synthesis and Control Using Differentiable DSP
Giorgio Fabbro
Vladimir Golkov
Thomas Kemp
Zorah Lähner
78
12
0
28 Oct 2020
Unsupervised Learning of Disentangled Speech Content and Style Representation
Andros Tjandra
Ruoming Pang
Yu Zhang
Shigeki Karita
BDL
DRL
73
15
0
24 Oct 2020
GraphSpeech: Syntax-Aware Graph Attention Network For Neural Speech Synthesis
Rui Liu
Berrak Sisman
Haizhou Li
96
25
0
23 Oct 2020
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Isaac Elias
Heiga Zen
Jonathan Shen
Yu Zhang
Ye Jia
Ron J. Weiss
Yonghui Wu
DRL
76
103
0
22 Oct 2020
Learning Speaker Embedding from Text-to-Speech
Jaejin Cho
Piotr Żelasko
Jesus Villalba
Shinji Watanabe
Najim Dehak
66
11
0
21 Oct 2020
The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS
Wen-Chin Huang
Tomoki Hayashi
Shinji Watanabe
Tomoki Toda
DRL
81
40
0
06 Oct 2020
Controllable Neural Prosody Synthesis
Max Morrison
Zeyu Jin
Justin Salamon
Nicholas J. Bryan
G. J. Mysore
57
20
0
07 Aug 2020
One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech
Tomás Nekvinda
Ondrej Dusek
72
57
0
03 Aug 2020
Music FaderNets: Controllable Music Generation Based On High-Level Features via Low-Level Feature Modelling
Hao Hao Tan
Dorien Herremans
MGen
60
74
0
29 Jul 2020
Generative Modelling for Controllable Audio Synthesis of Expressive Piano Performance
Hao Hao Tan
Yin-Jyun Luo
Dorien Herremans
45
8
0
16 Jun 2020
Neural voice cloning with a few low-quality samples
Sunghee Jung
Hoi-Rim Kim
37
3
0
12 Jun 2020
MultiSpeech: Multi-Speaker Text to Speech with Transformer
Mingjian Chen
Xu Tan
Yi Ren
Jin Xu
Hao Sun
Sheng Zhao
Tao Qin
Tie-Yan Liu
65
110
0
08 Jun 2020
MHVAE: a Human-Inspired Deep Hierarchical Generative Model for Multimodal Representation Learning
Miguel Vasco
Francisco S. Melo
Ana Paiva
DRL
39
11
0
04 Jun 2020
Pitchtron: Towards audiobook generation from ordinary people's voices
Sunghee Jung
Hoi-Rim Kim
41
5
0
21 May 2020
Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding
Seungwoo Choi
Seungju Han
Dongyoung Kim
S. Ha
91
67
0
18 May 2020
Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis Using Discrete Speech Representation
Tao Tu
Yuan-Jui Chen
Alexander H. Liu
Hung-yi Lee
54
7
0
16 May 2020
You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
A. Laptev
Roman Korostik
A. Svischev
A. Andrusenko
Ivan Medennikov
S. Rybin
81
61
0
14 May 2020
Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis
Rafael Valle
Kevin J. Shih
R. Prenger
Bryan Catanzaro
96
121
0
12 May 2020
Jukebox: A Generative Model for Music
Prafulla Dhariwal
Heewoo Jun
Christine Payne
Jong Wook Kim
Alec Radford
Ilya Sutskever
VLM
171
758
0
30 Apr 2020
The Attacker's Perspective on Automatic Speaker Verification: An Overview
Rohan Kumar Das
Xiaohai Tian
Tomi Kinnunen
Haizhou Li
AAML
68
80
0
19 Apr 2020
Unsupervised Style and Content Separation by Minimizing Mutual Information for Speech Synthesis
Ting-Yao Hu
A. Shrivastava
Oncel Tuzel
C. Dhir
57
32
0
09 Mar 2020
Deterministic Decoding for Discrete Data in Variational Autoencoders
Daniil Polykovskiy
Dmitry Vetrov
OffRL
69
8
0
04 Mar 2020
Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis
Guangzhi Sun
Yu Zhang
Ron J. Weiss
Yuanbin Cao
Heiga Zen
Yonghui Wu
56
130
0
06 Feb 2020
Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior
Guangzhi Sun
Yu Zhang
Ron J. Weiss
Yuan Cao
Heiga Zen
Andrew Rosenberg
Bhuvana Ramabhadran
Yonghui Wu
DiffM
98
93
0
06 Feb 2020
WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss
Rui Liu
Berrak Sisman
F. Bao
Guanglai Gao
Haizhou Li
125
14
0
02 Feb 2020
Unsupervised Representation Disentanglement using Cross Domain Features and Adversarial Learning in Variational Autoencoder based Voice Conversion
Wen-Chin Huang
Hao Luo
Hsin-Te Hwang
Chen-Chou Lo
Yu-Huai Peng
Yu Tsao
Hsin-Min Wang
DRL
63
42
0
22 Jan 2020
Previous
1
2
3
4
Next