ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1810.07217
  4. Cited By
Hierarchical Generative Modeling for Controllable Speech Synthesis
v1v2 (latest)

Hierarchical Generative Modeling for Controllable Speech Synthesis

16 October 2018
Wei-Ning Hsu
Yu Zhang
Ron J. Weiss
Heiga Zen
Yonghui Wu
Yuxuan Wang
Yuan Cao
Ye Jia
Zhiwen Chen
Jonathan Shen
Patrick Nguyen
Ruoming Pang
    BDL
ArXiv (abs)PDFHTML

Papers citing "Hierarchical Generative Modeling for Controllable Speech Synthesis"

28 / 178 papers shown
Title
Deep Representation Learning in Speech Processing: Challenges, Recent
  Advances, and Future Trends
Deep Representation Learning in Speech Processing: Challenges, Recent Advances, and Future Trends
S. Latif
R. Rana
Sara Khalifa
Raja Jurdak
Junaid Qadir
Björn W. Schuller
AI4TS
96
82
0
02 Jan 2020
Singing Voice Conversion with Disentangled Representations of Singer and
  Vocal Technique Using Variational Autoencoders
Singing Voice Conversion with Disentangled Representations of Singer and Vocal Technique Using Variational Autoencoders
Yin-Jyun Luo
Chin-Chen Hsu
Kat R. Agres
Dorien Herremans
DRL
99
47
0
03 Dec 2019
Dynamic Prosody Generation for Speech Synthesis using Linguistics-Driven
  Acoustic Embedding Selection
Dynamic Prosody Generation for Speech Synthesis using Linguistics-Driven Acoustic Embedding Selection
Shubhi Tyagi
M. Nicolis
Jonas Rohnke
Thomas Drugman
Jaime Lorenzo-Trueba
77
32
0
02 Dec 2019
Using VAEs and Normalizing Flows for One-shot Text-To-Speech Synthesis
  of Expressive Speech
Using VAEs and Normalizing Flows for One-shot Text-To-Speech Synthesis of Expressive Speech
Vatsal Aggarwal
Marius Cotescu
N. Prateek
Jaime Lorenzo-Trueba
Roberto Barra-Chicote
93
31
0
28 Nov 2019
Prosody Transfer in Neural Text to Speech Using Global Pitch and
  Loudness Features
Prosody Transfer in Neural Text to Speech Using Global Pitch and Loudness Features
Siddharth Gururani
Kilol Gupta
D. Shah
Z. Shakeri
Jervis Pinto
68
15
0
21 Nov 2019
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source
  End-to-End Text-to-Speech Toolkit
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit
Tomoki Hayashi
Ryuichi Yamamoto
Katsuki Inoue
Takenori Yoshimura
Shinji Watanabe
Tomoki Toda
K. Takeda
Yu Zhang
Xu Tan
VLM
93
205
0
24 Oct 2019
Semi-Supervised Generative Modeling for Controllable Speech Synthesis
Semi-Supervised Generative Modeling for Controllable Speech Synthesis
Raza Habib
Soroosh Mariooryad
Matt Shannon
Eric Battenberg
RJ Skerry-Ryan
Daisy Stanton
David Kao
Tom Bagby
BDL
68
48
0
03 Oct 2019
Speech Recognition with Augmented Synthesized Speech
Speech Recognition with Augmented Synthesized Speech
Andrew Rosenberg
Yu Zhang
Bhuvana Ramabhadran
Ye Jia
Pedro J. Moreno
Yonghui Wu
Zelin Wu
69
128
0
25 Sep 2019
Sequence to Sequence Neural Speech Synthesis with Prosody Modification
  Capabilities
Sequence to Sequence Neural Speech Synthesis with Prosody Modification Capabilities
Slava Shechtman
A. Sorin
56
33
0
23 Sep 2019
DurIAN: Duration Informed Attention Network For Multimodal Synthesis
DurIAN: Duration Informed Attention Network For Multimodal Synthesis
Chengzhu Yu
Heng Lu
Na Hu
Meng Yu
Chao Weng
...
Deyi Tuo
Shiyin Kang
Guangzhi Lei
Jane Polak Scowcroft
Dong Yu
CVBM
89
118
0
04 Sep 2019
Learning to Speak Fluently in a Foreign Language: Multilingual Speech
  Synthesis and Cross-Language Voice Cloning
Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning
Yu Zhang
Ron J. Weiss
Heiga Zen
Yonghui Wu
Zhiwen Chen
RJ Skerry-Ryan
Ye Jia
Andrew Rosenberg
Bhuvana Ramabhadran
76
189
0
09 Jul 2019
A Methodology for Controlling the Emotional Expressiveness in Synthetic
  Speech -- a Deep Learning approach
A Methodology for Controlling the Emotional Expressiveness in Synthetic Speech -- a Deep Learning approach
Noé Tits
40
10
0
05 Jul 2019
Improving Performance of End-to-End ASR on Numeric Sequences
Improving Performance of End-to-End ASR on Numeric Sequences
Cal Peyser
Hao Zhang
Tara N. Sainath
Zelin Wu
AI4TS
63
36
0
01 Jul 2019
Learning Disentangled Representations of Timbre and Pitch for Musical
  Instrument Sounds Using Gaussian Mixture Variational Autoencoders
Learning Disentangled Representations of Timbre and Pitch for Musical Instrument Sounds Using Gaussian Mixture Variational Autoencoders
Yin-Jyun Luo
Kat R. Agres
Dorien Herremans
103
46
0
19 Jun 2019
Using generative modelling to produce varied intonation for speech
  synthesis
Using generative modelling to produce varied intonation for speech synthesis
Zack Hodari
O. Watts
Simon King
67
29
0
10 Jun 2019
Effective Use of Variational Embedding Capacity in Expressive End-to-End
  Speech Synthesis
Effective Use of Variational Embedding Capacity in Expressive End-to-End Speech Synthesis
Eric Battenberg
Soroosh Mariooryad
Daisy Stanton
RJ Skerry-Ryan
Matt Shannon
David Kao
Tom Bagby
BDL
107
45
0
08 Jun 2019
MelNet: A Generative Model for Audio in the Frequency Domain
MelNet: A Generative Model for Audio in the Frequency Domain
Sean Vasquez
M. Lewis
DiffM
85
132
0
04 Jun 2019
Non-Autoregressive Neural Text-to-Speech
Non-Autoregressive Neural Text-to-Speech
Kainan Peng
Ming-Yu Liu
Z. Song
Kexin Zhao
101
40
0
21 May 2019
CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven
  Dynamic Hierarchical Conditional Variational Network
CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network
V. Wan
Chun-an Chan
Tom Kenter
Jakub Vít
R. Clark
71
75
0
17 May 2019
Direct speech-to-speech translation with a sequence-to-sequence model
Direct speech-to-speech translation with a sequence-to-sequence model
Ye Jia
Ron J. Weiss
Fadi Biadsy
Wolfgang Macherey
Melvin Johnson
Zhiwen Chen
Yonghui Wu
101
230
0
12 Apr 2019
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
Heiga Zen
Viet Dang
R. Clark
Yu Zhang
Ron J. Weiss
Ye Jia
Zhiwen Chen
Yonghui Wu
164
959
0
05 Apr 2019
In Other News: A Bi-style Text-to-speech Model for Synthesizing
  Newscaster Voice with Limited Data
In Other News: A Bi-style Text-to-speech Model for Synthesizing Newscaster Voice with Limited Data
N. Prateek
Mateusz Lajszczak
Roberto Barra-Chicote
Thomas Drugman
Jaime Lorenzo-Trueba
Thomas Merritt
S. Ronanki
Trevor Wood
87
30
0
04 Apr 2019
Multi-reference Tacotron by Intercross Training for Style
  Disentangling,Transfer and Control in Speech Synthesis
Multi-reference Tacotron by Intercross Training for Style Disentangling,Transfer and Control in Speech Synthesis
Yanyao Bian
Changbin Chen
Yongguo Kang
Zhenglin Pan
77
46
0
04 Apr 2019
Visualization and Interpretation of Latent Spaces for Controlling
  Expressive Speech Synthesis through Audio Analysis
Visualization and Interpretation of Latent Spaces for Controlling Expressive Speech Synthesis through Audio Analysis
Noé Tits
Fengna Wang
Kevin El Haddad
Vincent Pagel
Thierry Dutoit
DiffM
88
39
0
27 Mar 2019
Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence
  Modeling
Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling
Jonathan Shen
Patrick Nguyen
Yonghui Wu
Zhiwen Chen
Mengzhao Chen
...
William Chan
Shubham Toshniwal
Baohua Liao
M. Nirschl
Pat Rondon
VLM
113
211
0
21 Feb 2019
Unsupervised speech representation learning using WaveNet autoencoders
Unsupervised speech representation learning using WaveNet autoencoders
J. Chorowski
Ron J. Weiss
Samy Bengio
Aaron van den Oord
SSL
76
319
0
25 Jan 2019
Leveraging Weakly Supervised Data to Improve End-to-End Speech-to-Text
  Translation
Leveraging Weakly Supervised Data to Improve End-to-End Speech-to-Text Translation
Ye Jia
Melvin Johnson
Wolfgang Macherey
Ron J. Weiss
Yuan Cao
Chung-Cheng Chiu
Naveen Ari
Stella Laurenzo
Yonghui Wu
98
163
0
05 Nov 2018
A Variational Prosody Model for Mapping the Context-Sensitive Variation
  of Functional Prosodic Prototypes
A Variational Prosody Model for Mapping the Context-Sensitive Variation of Functional Prosodic Prototypes
B. Gerazov
Gérard Bailly
Omar Mohammed
Yi Xu
Philip N. Garner
63
7
0
22 Jun 2018
Previous
1234