ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1810.07217
  4. Cited By
Hierarchical Generative Modeling for Controllable Speech Synthesis
v1v2 (latest)

Hierarchical Generative Modeling for Controllable Speech Synthesis

16 October 2018
Wei-Ning Hsu
Yu Zhang
Ron J. Weiss
Heiga Zen
Yonghui Wu
Yuxuan Wang
Yuan Cao
Ye Jia
Zhiwen Chen
Jonathan Shen
Patrick Nguyen
Ruoming Pang
    BDL
ArXiv (abs)PDFHTML

Papers citing "Hierarchical Generative Modeling for Controllable Speech Synthesis"

50 / 178 papers shown
Title
A Study of Modeling Rising Intonation in Cantonese Neural Speech
  Synthesis
A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis
Qibing Bai
Tom Ko
Yu Zhang
92
4
0
03 Aug 2022
Generative Extraction of Audio Classifiers for Speaker Identification
Generative Extraction of Audio Classifiers for Speaker Identification
Tejumade Afonja
Lucas Bourtoule
Varun Chandrasekaran
Sageev Oore
Nicolas Papernot
AAML
61
1
0
26 Jul 2022
Controllable Data Generation by Deep Learning: A Review
Controllable Data Generation by Deep Learning: A Review
Shiyu Wang
Yuanqi Du
Xiaojie Guo
Bo Pan
Zhaohui Qin
Liang Zhao
99
28
0
19 Jul 2022
Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech
Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech
Zhengxi Liu
Qiao Tian
Chenxu Hu
Xudong Liu
Meng-Che Wu
Yuping Wang
Hang Zhao
Yuxuan Wang
87
10
0
13 Jul 2022
Automatic Evaluation of Speaker Similarity
Automatic Evaluation of Speaker Similarity
Kamil Deja
Ariadna Sánchez
Julian Roth
Marius Cotescu
50
6
0
01 Jul 2022
End-to-End Text-to-Speech Based on Latent Representation of Speaking
  Styles Using Spontaneous Dialogue
End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue
Kentaro Mitsui
Tianyu Zhao
Kei Sawada
Yukiya Hono
Yoshihiko Nankaku
K. Tokuda
67
14
0
24 Jun 2022
StyleTTS: A Style-Based Generative Model for Natural and Diverse
  Text-to-Speech Synthesis
StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis
Yinghao Aaron Li
Cong Han
N. Mesgarani
112
40
0
30 May 2022
Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech
Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech
Yongqian Li
Cheng Yu
Guangzhi Sun
Hua Jiang
Fanglei Sun
Weiqin Zu
Ying Wen
Yang Yang
Jun Wang
53
7
0
09 May 2022
Cross-Speaker Emotion Transfer for Low-Resource Text-to-Speech Using
  Non-Parallel Voice Conversion with Pitch-Shift Data Augmentation
Cross-Speaker Emotion Transfer for Low-Resource Text-to-Speech Using Non-Parallel Voice Conversion with Pitch-Shift Data Augmentation
Ryo Terashima
Ryuichi Yamamoto
Eunwoo Song
Yuma Shirahata
Hyun-Wook Yoon
Jae-Min Kim
Kentaro Tachibana
52
16
0
21 Apr 2022
Hierarchical and Multi-Scale Variational Autoencoder for Diverse and
  Natural Non-Autoregressive Text-to-Speech
Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-Speech
Jaesung Bae
Jinhyeok Yang
Taejun Bak
Young-Sun Joo
DiffM
126
6
0
08 Apr 2022
Into-TTS : Intonation Template Based Prosody Control System
Into-TTS : Intonation Template Based Prosody Control System
Jihwan Lee
Joun Yeop Lee
Heejin Choi
Seongkyu Mun
Sangjun Park
Jae-Sung Bae
Chanwoo Kim
135
4
0
04 Apr 2022
Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker
  Adaptation in Text-to-Speech Synthesis
Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis
Yixuan Zhou
Changhe Song
Xiang Li
Lu Zhang
Zhiyong Wu
Yanyao Bian
Jane Polak Scowcroft
Helen Meng
139
23
0
03 Apr 2022
DRSpeech: Degradation-Robust Text-to-Speech Synthesis with Frame-Level
  and Utterance-Level Acoustic Representation Learning
DRSpeech: Degradation-Robust Text-to-Speech Synthesis with Frame-Level and Utterance-Level Acoustic Representation Learning
Takaaki Saeki
Kentaro Tachibana
Ryuichi Yamamoto
53
11
0
29 Mar 2022
MuSE-SVS: Multi-Singer Emotional Singing Voice Synthesizer that Controls
  Emotional Intensity
MuSE-SVS: Multi-Singer Emotional Singing Voice Synthesizer that Controls Emotional Intensity
Sungjae Kim
Y.E. Kim
Jewoo Jun
Injung Kim
107
14
0
02 Mar 2022
SpeechPainter: Text-conditioned Speech Inpainting
SpeechPainter: Text-conditioned Speech Inpainting
Zalan Borsos
Matthew Sharifi
Marco Tagliasacchi
93
28
0
15 Feb 2022
Unsupervised word-level prosody tagging for controllable speech
  synthesis
Unsupervised word-level prosody tagging for controllable speech synthesis
Yiwei Guo
Chenpeng Du
Kai Yu
67
15
0
15 Feb 2022
Building Synthetic Speaker Profiles in Text-to-Speech Systems
Building Synthetic Speaker Profiles in Text-to-Speech Systems
Jie Pu
Yi Meng
Oguz H. Elibol
40
2
0
07 Feb 2022
Disentangling Style and Speaker Attributes for TTS Style Transfer
Disentangling Style and Speaker Attributes for TTS Style Transfer
Xiaochun An
Frank Soong
Lei Xie
155
18
0
24 Jan 2022
MsEmoTTS: Multi-scale emotion transfer, prediction, and control for
  emotional speech synthesis
MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis
Yinjiao Lei
Shan Yang
Xinsheng Wang
Lei Xie
79
75
0
17 Jan 2022
Conditional Deep Hierarchical Variational Autoencoder for Voice
  Conversion
Conditional Deep Hierarchical Variational Autoencoder for Voice Conversion
K. Akuzawa
Kotaro Onishi
Keisuke Takiguchi
Kohki Mametani
K. Mori
BDLDRL
70
7
0
06 Dec 2021
V2C: Visual Voice Cloning
V2C: Visual Voice Cloning
Qi Chen
Yuanqing Li
Yuankai Qi
Jiaqiu Zhou
Mingkui Tan
Qi Wu
VGen
81
27
0
25 Nov 2021
Prosodic Clustering for Phoneme-level Prosody Control in End-to-End
  Speech Synthesis
Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis
Alexandra Vioni
Myrsini Christidou
Nikolaos Ellinas
G. Vamvoukakis
Panos Kakoulidis
Taehoon Kim
June Sig Sung
Hyoungmin Park
Aimilios Chalamandaris
Pirros Tsiakoulis
60
11
0
19 Nov 2021
Word-Level Style Control for Expressive, Non-attentive Speech Synthesis
Word-Level Style Control for Expressive, Non-attentive Speech Synthesis
Konstantinos Klapsas
Nikolaos Ellinas
June Sig Sung
Hyoungmin Park
S. Raptis
144
9
0
19 Nov 2021
Improved Prosodic Clustering for Multispeaker and Speaker-independent
  Phoneme-level Prosody Control
Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control
Myrsini Christidou
Alexandra Vioni
Nikolaos Ellinas
G. Vamvoukakis
K. Markopoulos
Panos Kakoulidis
June Sig Sung
Hyoungmin Park
Aimilios Chalamandaris
Pirros Tsiakoulis
62
4
0
19 Nov 2021
More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech
More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech
Michael Hassid
Michelle Tadmor Ramanovich
Brendan Shillingford
Miaosen Wang
Ye Jia
Tal Remez
DiffM
72
18
0
19 Nov 2021
Cross-lingual Low Resource Speaker Adaptation Using Phonological
  Features
Cross-lingual Low Resource Speaker Adaptation Using Phonological Features
Georgia Maniati
Nikolaos Ellinas
K. Markopoulos
G. Vamvoukakis
June Sig Sung
Hyoungmin Park
Aimilios Chalamandaris
Pirros Tsiakoulis
64
14
0
17 Nov 2021
Zero-shot Singing Technique Conversion
Zero-shot Singing Technique Conversion
Brendan O'Connor
S. Dixon
Georgy Fazekas
58
5
0
16 Nov 2021
Discrete Acoustic Space for an Efficient Sampling in Neural
  Text-To-Speech
Discrete Acoustic Space for an Efficient Sampling in Neural Text-To-Speech
Mu Li
Jonas Rohnke
Antonio Bonafonte
Mateusz Lajszczak
Trevor Wood
DRL
96
2
0
24 Oct 2021
Variational Predictive Routing with Nested Subjective Timescales
Variational Predictive Routing with Nested Subjective Timescales
Alexey Zakharov
Qinghai Guo
Zafeirios Fountas
BDLAI4TS
67
9
0
21 Oct 2021
CycleFlow: Purify Information Factors by Cycle Loss
CycleFlow: Purify Information Factors by Cycle Loss
Haoran Sun
Chen Chen
Lantian Li
Dong Wang
65
1
0
18 Oct 2021
PixelPyramids: Exact Inference Models from Lossless Image Pyramids
PixelPyramids: Exact Inference Models from Lossless Image Pyramids
Shweta Mahajan
Stefan Roth
TPM
51
2
0
17 Oct 2021
Emphasis control for parallel neural TTS
Emphasis control for parallel neural TTS
Shreyas Seshadri
T. Raitio
D. Castellani
Jiangchuan Li
120
11
0
06 Oct 2021
Hierarchical prosody modeling and control in non-autoregressive parallel
  neural TTS
Hierarchical prosody modeling and control in non-autoregressive parallel neural TTS
T. Raitio
Jiangchuan Li
Shreyas Seshadri
78
23
0
06 Oct 2021
Style Equalization: Unsupervised Learning of Controllable Generative
  Sequence Models
Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models
Jen-Hao Rick Chang
A. Shrivastava
H. Koppula
Xiaoshuai Zhang
Oncel Tuzel
DiffM
111
16
0
06 Oct 2021
"Hello, It's Me": Deep Learning-based Speech Synthesis Attacks in the
  Real World
"Hello, It's Me": Deep Learning-based Speech Synthesis Attacks in the Real World
Emily Wenger
Max Bronckers
Christian Cianfarani
Jenna Cryan
Angela Sha
Haitao Zheng
Ben Y. Zhao
AAML
79
40
0
20 Sep 2021
fairseq S^2: A Scalable and Integrable Speech Synthesis Toolkit
fairseq S^2: A Scalable and Integrable Speech Synthesis Toolkit
Changhan Wang
Wei-Ning Hsu
Yossi Adi
Adam Polyak
Ann Lee
Peng-Jen Chen
Jiatao Gu
J. Pino
VLM
106
32
0
14 Sep 2021
Referee: Towards reference-free cross-speaker style transfer with
  low-quality data for expressive speech synthesis
Referee: Towards reference-free cross-speaker style transfer with low-quality data for expressive speech synthesis
Songxiang Liu
Shan Yang
Jane Polak Scowcroft
Dong Yu
AI4TS
62
10
0
08 Sep 2021
Speaker-Conditioned Hierarchical Modeling for Automated Speech Scoring
Speaker-Conditioned Hierarchical Modeling for Automated Speech Scoring
Yaman Kumar Singla
Avykat Gupta
Shaurya Bagga
Changyou Chen
Balaji Krishnamurthy
R. Shah
86
12
0
30 Aug 2021
Injecting Text in Self-Supervised Speech Pretraining
Injecting Text in Self-Supervised Speech Pretraining
Zhehuai Chen
Yu Zhang
Andrew Rosenberg
Bhuvana Ramabhadran
Gary Wang
Pedro J. Moreno
SSL
90
36
0
27 Aug 2021
Enhancing audio quality for expressive Neural Text-to-Speech
Enhancing audio quality for expressive Neural Text-to-Speech
Abdelhamid Ezzerg
Adam Gabry's
Bartosz Putrycz
Daniel Korzekwa
Daniel Sáez-Trigueros
David McHardy
Kamil Pokora
Jakub Lachowicz
Jaime Lorenzo-Trueba
V. Klimkov
132
6
0
13 Aug 2021
Daft-Exprt: Cross-Speaker Prosody Transfer on Any Text for Expressive
  Speech Synthesis
Daft-Exprt: Cross-Speaker Prosody Transfer on Any Text for Expressive Speech Synthesis
Julian Zaïdi
Hugo Seuté
Benjamin van Niekerk
M. Carbonneau
61
21
0
04 Aug 2021
Information Sieve: Content Leakage Reduction in End-to-End Prosody For
  Expressive Speech Synthesis
Information Sieve: Content Leakage Reduction in End-to-End Prosody For Expressive Speech Synthesis
Xudong Dai
Cheng Gong
Longbiao Wang
Kaili Zhang
34
2
0
04 Aug 2021
On Prosody Modeling for ASR+TTS based Voice Conversion
On Prosody Modeling for ASR+TTS based Voice Conversion
Wen-Chin Huang
Tomoki Hayashi
Xinjian Li
Shinji Watanabe
Tomoki Toda
73
9
0
20 Jul 2021
Msdtron: a high-capability multi-speaker speech synthesis system for
  diverse data using characteristic information
Msdtron: a high-capability multi-speaker speech synthesis system for diverse data using characteristic information
Qinghua Wu
Quanbo Shen
Jian Luan
YuJun Wang
72
4
0
07 Jul 2021
Multi-Scale Spectrogram Modelling for Neural Text-to-Speech
Multi-Scale Spectrogram Modelling for Neural Text-to-Speech
Ammar Abbas
Bajibabu Bollepalli
Alexis Moinet
Arnaud Joly
Penny Karanasou
Peter Makarov
Simon Slangens
S. Karlapati
Thomas Drugman
67
0
0
29 Jun 2021
A Survey on Neural Speech Synthesis
A Survey on Neural Speech Synthesis
Xu Tan
Tao Qin
Frank Soong
Tie-Yan Liu
AI4TS
133
359
0
29 Jun 2021
FastPitchFormant: Source-filter based Decomposed Modeling for Speech
  Synthesis
FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis
Taejun Bak
Jaesung Bae
Hanbin Bae
Young-Ik Kim
Hoon-Young Cho
120
17
0
29 Jun 2021
UniTTS: Residual Learning of Unified Embedding Space for Speech Style
  Control
UniTTS: Residual Learning of Unified Embedding Space for Speech Style Control
M. Kang
Sungjae Kim
Injung Kim
77
3
0
21 Jun 2021
Improving Performance of Seen and Unseen Speech Style Transfer in
  End-to-end Neural TTS
Improving Performance of Seen and Unseen Speech Style Transfer in End-to-end Neural TTS
Xiaochun An
Frank Soong
Lei Xie
119
9
0
18 Jun 2021
Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis
Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis
D. Mohan
Qinmin Hu
Tian Huey Teh
Alexandra Torresquintero
C. Wallis
Marlene Staib
Lorenzo Foglianti
Jiameng Gao
Simon King
55
17
0
15 Jun 2021
Previous
1234
Next