Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1812.04342
Cited By
v1
v2 (latest)
Learning latent representations for style control and transfer in end-to-end speech synthesis
11 December 2018
Ya-Jie Zhang
Shifeng Pan
Lei He
Zhenhua Ling
BDL
SSL
DRL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Learning latent representations for style control and transfer in end-to-end speech synthesis"
50 / 119 papers shown
Title
StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis
Yinghao Aaron Li
Cong Han
N. Mesgarani
112
40
0
30 May 2022
ReCAB-VAE: Gumbel-Softmax Variational Inference Based on Analytic Divergence
Sangshin Oh
Seyun Um
Hong-Goo Kang
BDL
DRL
41
2
0
09 May 2022
Cross-Speaker Emotion Transfer for Low-Resource Text-to-Speech Using Non-Parallel Voice Conversion with Pitch-Shift Data Augmentation
Ryo Terashima
Ryuichi Yamamoto
Eunwoo Song
Yuma Shirahata
Hyun-Wook Yoon
Jae-Min Kim
Kentaro Tachibana
52
16
0
21 Apr 2022
Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis
Shunwei Lei
Yixuan Zhou
Liyang Chen
Jiankun Hu
Zhiyong Wu
Shiyin Kang
Helen Meng
79
10
0
06 Apr 2022
Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis
Shunwei Lei
Yixuan Zhou
Liyang Chen
Zhiyong Wu
Shiyin Kang
Helen Meng
52
12
0
23 Mar 2022
Speaker Adaption with Intuitive Prosodic Features for Statistical Parametric Speech Synthesis
Pengyu Cheng
Zhenhua Ling
72
3
0
02 Mar 2022
Cross-speaker style transfer for text-to-speech using data augmentation
M. Ribeiro
Julian Roth
Giulia Comini
Goeric Huybrechts
Adam Gabry's
Jaime Lorenzo-Trueba
66
21
0
10 Feb 2022
Disentangling Style and Speaker Attributes for TTS Style Transfer
Xiaochun An
Frank Soong
Lei Xie
155
18
0
24 Jan 2022
MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis
Yinjiao Lei
Shan Yang
Xinsheng Wang
Lei Xie
79
75
0
17 Jan 2022
Emotion Intensity and its Control for Emotional Voice Conversion
Kun Zhou
Berrak Sisman
R. Rana
Björn W. Schuller
Haizhou Li
172
58
0
10 Jan 2022
Multi-speaker Multi-style Text-to-speech Synthesis With Single-speaker Single-style Training Data Scenarios
Qicong Xie
Tao Li
Xinsheng Wang
Zhichao Wang
Lei Xie
Guoqiao Yu
Guanglu Wan
86
11
0
23 Dec 2021
How Speech is Recognized to Be Emotional - A Study Based on Information Decomposition
Haoran Sun
Lantian Li
Tianshi Zheng
Dong Wang
CVBM
51
0
0
24 Nov 2021
Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis
Alexandra Vioni
Myrsini Christidou
Nikolaos Ellinas
G. Vamvoukakis
Panos Kakoulidis
Taehoon Kim
June Sig Sung
Hyoungmin Park
Aimilios Chalamandaris
Pirros Tsiakoulis
60
11
0
19 Nov 2021
Word-Level Style Control for Expressive, Non-attentive Speech Synthesis
Konstantinos Klapsas
Nikolaos Ellinas
June Sig Sung
Hyoungmin Park
S. Raptis
144
9
0
19 Nov 2021
Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control
Myrsini Christidou
Alexandra Vioni
Nikolaos Ellinas
G. Vamvoukakis
K. Markopoulos
Panos Kakoulidis
June Sig Sung
Hyoungmin Park
Aimilios Chalamandaris
Pirros Tsiakoulis
62
4
0
19 Nov 2021
More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech
Michael Hassid
Michelle Tadmor Ramanovich
Brendan Shillingford
Miaosen Wang
Ye Jia
Tal Remez
DiffM
72
18
0
19 Nov 2021
Discrete Acoustic Space for an Efficient Sampling in Neural Text-To-Speech
Mu Li
Jonas Rohnke
Antonio Bonafonte
Mateusz Lajszczak
Trevor Wood
DRL
96
2
0
24 Oct 2021
Improving Emotional Speech Synthesis by Using SUS-Constrained VAE and Text Encoder Aggregation
Fengyu Yang
Jian Luan
Yujun Wang
137
5
0
19 Oct 2021
DeepA: A Deep Neural Analyzer For Speech And Singing Vocoding
Sergey Nikonorov
Berrak Sisman
Mingyang Zhang
Haizhou Li
39
3
0
13 Oct 2021
Using multiple reference audios and style embedding constraints for speech synthesis
Cheng Gong
Longbiao Wang
Zhenhua Ling
Ju Zhang
Jianwu Dang
48
5
0
09 Oct 2021
Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech
Pengfei Wu
Junjie Pan
Chenchang Xu
Junhui Zhang
Lin Wu
Xiang Yin
Zejun Ma
62
16
0
08 Oct 2021
Hierarchical prosody modeling and control in non-autoregressive parallel neural TTS
T. Raitio
Jiangchuan Li
Shreyas Seshadri
78
23
0
06 Oct 2021
Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning
Rui Li
dong Pu
Minnie Huang
Bill Huang
86
14
0
23 Sep 2021
LDC-VAE: A Latent Distribution Consistency Approach to Variational AutoEncoders
Xiaoyu Chen
Chen Gong
Qiang He
Xinwen Hou
Yu Liu
77
1
0
22 Sep 2021
Cross-speaker emotion disentangling and transfer for end-to-end speech synthesis
Tao Li
Xinsheng Wang
Qicong Xie
Zhichao Wang
Linfu Xie
67
47
0
14 Sep 2021
Daft-Exprt: Cross-Speaker Prosody Transfer on Any Text for Expressive Speech Synthesis
Julian Zaïdi
Hugo Seuté
Benjamin van Niekerk
M. Carbonneau
61
21
0
04 Aug 2021
Information Sieve: Content Leakage Reduction in End-to-End Prosody For Expressive Speech Synthesis
Xudong Dai
Cheng Gong
Longbiao Wang
Kaili Zhang
34
2
0
04 Aug 2021
Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis
Shifeng Pan
Lei He
92
23
0
27 Jul 2021
On Prosody Modeling for ASR+TTS based Voice Conversion
Wen-Chin Huang
Tomoki Hayashi
Xinjian Li
Shinji Watanabe
Tomoki Toda
73
9
0
20 Jul 2021
Learning De-identified Representations of Prosody from Raw Audio
J. Weston
R. Lenain
U. Meepegama
E. Fristed
SSL
68
17
0
17 Jul 2021
A Survey on Neural Speech Synthesis
Xu Tan
Tao Qin
Frank Soong
Tie-Yan Liu
AI4TS
133
359
0
29 Jun 2021
FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis
Taejun Bak
Jaesung Bae
Hanbin Bae
Young-Ik Kim
Hoon-Young Cho
120
17
0
29 Jun 2021
Controllable Context-aware Conversational Speech Synthesis
Jian Cong
Shan Yang
Na Hu
Guangzhi Li
Lei Xie
Jane Polak Scowcroft
73
30
0
21 Jun 2021
Improving Performance of Seen and Unseen Speech Style Transfer in End-to-end Neural TTS
Xiaochun An
Frank Soong
Lei Xie
119
9
0
18 Jun 2021
Enriching Source Style Transfer in Recognition-Synthesis based Non-Parallel Voice Conversion
Zhichao Wang
Xinyong Zhou
Fengyu Yang
Tao Li
Hongqiang Du
Lei Xie
Wendong Gan
Haitao Chen
Hai Li
65
22
0
16 Jun 2021
A learned conditional prior for the VAE acoustic space of a TTS system
Panagiota Karanasou
S. Karlapati
Alexis Moinet
Arnaud Joly
Ammar Abbas
Simon Slangen
Jaime Lorenzo-Trueba
Thomas Drugman
74
7
0
14 Jun 2021
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Jaehyeon Kim
Jungil Kong
Juhee Son
DRL
167
902
0
11 Jun 2021
Speech BERT Embedding For Improving Prosody in Neural TTS
Liping Chen
Yan Deng
Xi Wang
Frank Soong
Lei He
89
23
0
08 Jun 2021
Learning Robust Latent Representations for Controllable Speech Synthesis
Shakti Kumar
Jithin Pradeep
Hussain Zaidi
DRL
68
6
0
10 May 2021
Review of end-to-end speech synthesis technology based on deep learning
Zhaoxi Mu
Xinyu Yang
Yizhuo Dong
AuLLM
ALM
94
25
0
20 Apr 2021
Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis
Yixuan Zhou
Changhe Song
Jingbei Li
Zhiyong Wu
Yanyao Bian
Jane Polak Scowcroft
Helen Meng
103
6
0
14 Apr 2021
Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability
Rui Liu
Berrak Sisman
Haizhou Li
69
32
0
03 Apr 2021
Energy Disaggregation using Variational Autoencoders
A. Langevin
M. Carbonneau
M. Cheriet
G. Gagnon
DRL
57
59
0
22 Mar 2021
STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech
Keon Lee
Kyumin Park
Daeyoung Kim
69
32
0
17 Mar 2021
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units
Wei-Ning Hsu
David Harwath
Christopher Song
James R. Glass
CLIP
90
67
0
31 Dec 2020
Parallel WaveNet conditioned on VAE latent vectors
Jonas Rohnke
Thomas Merritt
Jaime Lorenzo-Trueba
Adam Gabry's
Vatsal Aggarwal
Alexis Moinet
Roberto Barra-Chicote
74
3
0
17 Dec 2020
Few Shot Adaptive Normalization Driven Multi-Speaker Speech Synthesis
Neeraj Kumar
Srishti Goel
Ankur Narang
Brejesh Lall
68
5
0
14 Dec 2020
GraphPB: Graphical Representations of Prosody Boundary in Speech Synthesis
Aolan Sun
Jianzong Wang
Ning Cheng
Huayi Peng
Zhen Zeng
Lingwei Kong
Jing Xiao
62
9
0
03 Dec 2020
Controllable Emotion Transfer For End-to-End Speech Synthesis
Tao Li
Shan Yang
Liumeng Xue
Lei Xie
79
74
0
17 Nov 2020
Fine-grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis
Yinjiao Lei
Shan Yang
Lei Xie
88
56
0
17 Nov 2020
Previous
1
2
3
Next