Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1710.07654
Cited By
v1
v2
v3 (latest)
Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning
20 October 2017
Ming-Yu Liu
Kainan Peng
Andrew Gibiansky
Sercan O. Arik
Ajay Kannan
Sharan Narang
Jonathan Raiman
John Miller
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning"
50 / 170 papers shown
Title
Disentangling Style and Speaker Attributes for TTS Style Transfer
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Xiaochun An
Frank Soong
Lei Xie
267
21
0
24 Jan 2022
MHTTS: Fast multi-head text-to-speech for spontaneous speech with imperfect transcription
IEEE International Conference on Tools with Artificial Intelligence (ICTAI), 2022
Dabiao Ma
Yitong Zhang
Meng Li
Feng Ye
75
1
0
19 Jan 2022
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
International Conference on Machine Learning (ICML), 2021
Edresson Casanova
Julian Weber
C. Shulby
Arnaldo Cândido Júnior
Eren Golge
M. Ponti
569
536
0
04 Dec 2021
Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech
Sung-Feng Huang
Chyi-Jiunn Lin
Da-Rong Liu
Yi-Chen Chen
Hung-yi Lee
386
70
0
07 Nov 2021
Emotional Prosody Control for Speech Generation
S. Sivaprasad
Saiteja Kosgi
Vineet Gandhi
146
20
0
07 Nov 2021
Intelligent Video Editing: Incorporating Modern Talking Face Generation Algorithms in a Video Editor
Anchit Gupta
Faizan Farooq Khan
Rudrabha Mukhopadhyay
Vinay P. Namboodiri
C. V. Jawahar
CVBM
173
6
0
16 Oct 2021
Neural Dubber: Dubbing for Videos According to Scripts
Chenxu Hu
Qiao Tian
Tingle Li
Yuping Wang
Yuxuan Wang
Hang Zhao
DiffM
VGen
226
50
0
15 Oct 2021
Adapting TTS models For New Speakers using Transfer Learning
Paarth Neekhara
Jason Chun Lok Li
Boris Ginsburg
192
19
0
12 Oct 2021
Hierarchical prosody modeling and control in non-autoregressive parallel neural TTS
T. Raitio
Jiangchuan Li
Shreyas Seshadri
183
26
0
06 Oct 2021
GANtron: Emotional Speech Synthesis with Generative Adversarial Networks
E. Hortal
Rodrigo Brechard Alarcia
GAN
77
2
0
06 Oct 2021
PortaSpeech: Portable and High-Quality Generative Text-to-Speech
Yi Ren
Jinglin Liu
Zhou Zhao
317
90
0
30 Sep 2021
On-device neural speech synthesis
Sivanand Achanta
Albert Antony
L. Golipour
Jiangchuan Li
T. Raitio
...
Francesco Rossi
Jennifer Shi
Jaimin Upadhyay
David Winarsky
Hepeng Zhang
222
19
0
17 Sep 2021
Cross-speaker emotion disentangling and transfer for end-to-end speech synthesis
Tao Li
Xinsheng Wang
Qicong Xie
Zhichao Wang
Linfu Xie
153
62
0
14 Sep 2021
A Survey on Audio Synthesis and Audio-Visual Multimodal Processing
Zhaofeng Shi
129
11
0
01 Aug 2021
Facetron: A Multi-speaker Face-to-Speech Model based on Cross-modal Latent Representations
European Signal Processing Conference (EUSIPCO), 2021
Seyun Um
Jihyun Kim
Jihyun Lee
Hong-Goo Kang
CVBM
282
4
0
26 Jul 2021
Interactive Storytelling for Children: A Case-study of Design and Development Considerations for Ethical Conversational AI
J. Chubb
S. Missaoui
S. Concannon
Liam Maloney
James Alfred Walker
138
43
0
20 Jul 2021
VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis
Hui Lu
Zhiyong Wu
Xixin Wu
Xu Li
Shiyin Kang
Xunying Liu
Helen Meng
93
15
0
07 Jul 2021
AdaSpeech 3: Adaptive Text to Speech for Spontaneous Style
Yuzi Yan
Xu Tan
Bohan Li
Guangyan Zhang
Tao Qin
Sheng Zhao
Yuan-Chung Shen
Weiqiang Zhang
Tie-Yan Liu
116
23
0
06 Jul 2021
EditSpeech: A Text Based Speech Editing System Using Partial Inference and Bidirectional Fusion
Daxin Tan
Liqun Deng
Y. Yeung
Xin Jiang
Xiao Chen
Tan Lee
143
50
0
04 Jul 2021
A Survey on Neural Speech Synthesis
Xu Tan
Tao Qin
Frank Soong
Tie-Yan Liu
AI4TS
287
427
0
29 Jun 2021
GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis
Interspeech (Interspeech), 2021
Jinhyeok Yang
Jaesung Bae
Taejun Bak
Young-Ik Kim
Hoon-Young Cho
170
42
0
29 Jun 2021
Distilling the Knowledge from Conditional Normalizing Flows
Dmitry Baranchuk
Vladimir Aliev
Artem Babenko
BDL
180
4
0
24 Jun 2021
Improving Performance of Seen and Unseen Speech Style Transfer in End-to-end Neural TTS
Interspeech (Interspeech), 2021
Xiaochun An
Frank Soong
Lei Xie
278
9
0
18 Jun 2021
Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis
D. Mohan
Qinmin Hu
Tian Huey Teh
Alexandra Torresquintero
C. Wallis
Marlene Staib
Lorenzo Foglianti
Jiameng Gao
Simon King
125
20
0
15 Jun 2021
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
International Conference on Machine Learning (ICML), 2021
Jaehyeon Kim
Jungil Kong
Juhee Son
DRL
240
1,124
0
11 Jun 2021
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
International Conference on Machine Learning (ICML), 2021
Dong Min
Dong Bok Lee
Eunho Yang
Sung Ju Hwang
282
206
0
06 Jun 2021
An objective evaluation of the effects of recording conditions and speaker characteristics in multi-speaker deep neural speech synthesis
International Conference on Knowledge-Based Intelligent Information & Engineering Systems (KES), 2021
Beáta Lőrincz
Adriana Stan
M. Giurgiu
66
2
0
03 Jun 2021
Speaker verification-derived loss and data augmentation for DNN-based multispeaker speech synthesis
European Signal Processing Conference (EUSIPCO), 2021
Beáta Lőrincz
Adriana Stan
M. Giurgiu
78
6
0
03 Jun 2021
ItôTTS and ItôWave: Linear Stochastic Differential Equation Is All You Need For Audio Generation
Shoule Wu
Ziqiang Shi
DiffM
206
11
0
17 May 2021
Interpreting intermediate convolutional layers of generative CNNs trained on waveforms
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2021
Gašper Beguš
Alan Zhou
240
8
0
19 Apr 2021
TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction
Stanislav Beliaev
Boris Ginsburg
169
10
0
16 Apr 2021
SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model
Interspeech (Interspeech), 2021
Edresson Casanova
C. Shulby
Eren Golge
Nicolas Müller
F. S. Oliveira
Arnaldo Cândido Júnior
A. S. Soares
S. Aluísio
M. Ponti
188
113
0
02 Apr 2021
Continual Speaker Adaptation for Text-to-Speech Synthesis
Hamed Hemati
Damian Borth
CLL
154
9
0
26 Mar 2021
AdaSpeech: Adaptive Text to Speech for Custom Voice
International Conference on Learning Representations (ICLR), 2021
Mingjian Chen
Xu Tan
Bohan Li
Yanqing Liu
Tao Qin
Sheng Zhao
Tie-Yan Liu
VLM
DiffM
198
211
0
01 Mar 2021
Deepfakes Generation and Detection: State-of-the-art, open challenges, countermeasures, and way forward
Momina Masood
M. Nawaz
K. Malik
A. Javed
Aun Irtaza
AAML
465
397
0
25 Feb 2021
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention
Peng Liu
Yuewen Cao
Songxiang Liu
Na Hu
Guangzhi Li
Chao Weng
Jane Polak Scowcroft
149
23
0
12 Feb 2021
Voice Cloning: a Multi-Speaker Text-to-Speech Synthesis Approach based on Transfer Learning
Giuseppe Ruggiero
Enrico Zovato
Luigi Di Caro
V. Pollet
DiffM
104
14
0
10 Feb 2021
Expressive Neural Voice Cloning
Asian Conference on Machine Learning (ACML), 2021
Paarth Neekhara
Shehzeen Samarah Hussain
Shlomo Dubnov
F. Koushanfar
Julian McAuley
DiffM
121
36
0
30 Jan 2021
Whispered and Lombard Neural Speech Synthesis
Spoken Language Technology Workshop (SLT), 2021
Qiong Hu
T. Bleisch
Petko N. Petkov
T. Raitio
Erik Marchi
V. Lakshminarasimhan
128
15
0
13 Jan 2021
Few Shot Adaptive Normalization Driven Multi-Speaker Speech Synthesis
Neeraj Kumar
Srishti Goel
Ankur Narang
Brejesh Lall
113
5
0
14 Dec 2020
EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture
Chenfeng Miao
Shuang Liang
Zhencheng Liu
Minchuan Chen
Jun Ma
Shaojun Wang
Jing Xiao
143
43
0
07 Dec 2020
MelGlow: Efficient Waveform Generative Network Based on Location-Variable Convolution
Spoken Language Technology Workshop (SLT), 2020
Zhen Zeng
Jianzong Wang
Ning Cheng
Jing Xiao
129
8
0
03 Dec 2020
Synth2Aug: Cross-domain speaker recognition with TTS synthesized speech
Spoken Language Technology Workshop (SLT), 2020
Yiling Huang
Yutian Chen
Jason W. Pelecanos
Quan Wang
144
13
0
24 Nov 2020
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Isaac Elias
Heiga Zen
Jonathan Shen
Yu Zhang
Ye Jia
Ron J. Weiss
Yonghui Wu
DRL
157
109
0
22 Oct 2020
Learning Speaker Embedding from Text-to-Speech
Jaejin Cho
Piotr Żelasko
Jesus Villalba
Shinji Watanabe
Najim Dehak
108
12
0
21 Oct 2020
Neural Speech Synthesis for Estonian
Liisa Rätsep
Liisi Piits
Hille Pajupuu
Indrek Hein
Mark Fišel
51
2
0
06 Oct 2020
HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis
Jiawei Chen
Xu Tan
Jian Luan
Tao Qin
Tie-Yan Liu
VLM
188
104
0
03 Sep 2020
Prosody Learning Mechanism for Speech Synthesis System Without Text Length Limit
Interspeech (Interspeech), 2020
Zhen Zeng
Jianzong Wang
Ning Cheng
Jing Xiao
133
9
0
13 Aug 2020
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
Knowledge Discovery and Data Mining (KDD), 2020
Jin Xu
Xu Tan
Yi Ren
Tao Qin
Jian Li
Sheng Zhao
Tie-Yan Liu
VLM
129
98
0
09 Aug 2020
An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2020
Berrak Sisman
Junichi Yamagishi
Simon King
Haizhou Li
BDL
391
388
0
09 Aug 2020
Previous
1
2
3
4
Next