ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1710.07654
  4. Cited By
Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence
  Learning
v1v2v3 (latest)

Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning

20 October 2017
Ming-Yu Liu
Kainan Peng
Andrew Gibiansky
Sercan O. Arik
Ajay Kannan
Sharan Narang
Jonathan Raiman
John Miller
ArXiv (abs)PDFHTML

Papers citing "Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning"

50 / 170 papers shown
Title
Disentangling Style and Speaker Attributes for TTS Style Transfer
Disentangling Style and Speaker Attributes for TTS Style TransferIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Xiaochun An
Frank Soong
Lei Xie
267
21
0
24 Jan 2022
MHTTS: Fast multi-head text-to-speech for spontaneous speech with
  imperfect transcription
MHTTS: Fast multi-head text-to-speech for spontaneous speech with imperfect transcriptionIEEE International Conference on Tools with Artificial Intelligence (ICTAI), 2022
Dabiao Ma
Yitong Zhang
Meng Li
Feng Ye
75
1
0
19 Jan 2022
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice
  Conversion for everyone
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyoneInternational Conference on Machine Learning (ICML), 2021
Edresson Casanova
Julian Weber
C. Shulby
Arnaldo Cândido Júnior
Eren Golge
M. Ponti
569
536
0
04 Dec 2021
Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech
Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech
Sung-Feng Huang
Chyi-Jiunn Lin
Da-Rong Liu
Yi-Chen Chen
Hung-yi Lee
386
70
0
07 Nov 2021
Emotional Prosody Control for Speech Generation
Emotional Prosody Control for Speech Generation
S. Sivaprasad
Saiteja Kosgi
Vineet Gandhi
146
20
0
07 Nov 2021
Intelligent Video Editing: Incorporating Modern Talking Face Generation
  Algorithms in a Video Editor
Intelligent Video Editing: Incorporating Modern Talking Face Generation Algorithms in a Video Editor
Anchit Gupta
Faizan Farooq Khan
Rudrabha Mukhopadhyay
Vinay P. Namboodiri
C. V. Jawahar
CVBM
173
6
0
16 Oct 2021
Neural Dubber: Dubbing for Videos According to Scripts
Neural Dubber: Dubbing for Videos According to Scripts
Chenxu Hu
Qiao Tian
Tingle Li
Yuping Wang
Yuxuan Wang
Hang Zhao
DiffMVGen
226
50
0
15 Oct 2021
Adapting TTS models For New Speakers using Transfer Learning
Adapting TTS models For New Speakers using Transfer Learning
Paarth Neekhara
Jason Chun Lok Li
Boris Ginsburg
192
19
0
12 Oct 2021
Hierarchical prosody modeling and control in non-autoregressive parallel
  neural TTS
Hierarchical prosody modeling and control in non-autoregressive parallel neural TTS
T. Raitio
Jiangchuan Li
Shreyas Seshadri
183
26
0
06 Oct 2021
GANtron: Emotional Speech Synthesis with Generative Adversarial Networks
GANtron: Emotional Speech Synthesis with Generative Adversarial Networks
E. Hortal
Rodrigo Brechard Alarcia
GAN
77
2
0
06 Oct 2021
PortaSpeech: Portable and High-Quality Generative Text-to-Speech
PortaSpeech: Portable and High-Quality Generative Text-to-Speech
Yi Ren
Jinglin Liu
Zhou Zhao
317
90
0
30 Sep 2021
On-device neural speech synthesis
On-device neural speech synthesis
Sivanand Achanta
Albert Antony
L. Golipour
Jiangchuan Li
T. Raitio
...
Francesco Rossi
Jennifer Shi
Jaimin Upadhyay
David Winarsky
Hepeng Zhang
222
19
0
17 Sep 2021
Cross-speaker emotion disentangling and transfer for end-to-end speech
  synthesis
Cross-speaker emotion disentangling and transfer for end-to-end speech synthesis
Tao Li
Xinsheng Wang
Qicong Xie
Zhichao Wang
Linfu Xie
153
62
0
14 Sep 2021
A Survey on Audio Synthesis and Audio-Visual Multimodal Processing
A Survey on Audio Synthesis and Audio-Visual Multimodal Processing
Zhaofeng Shi
129
11
0
01 Aug 2021
Facetron: A Multi-speaker Face-to-Speech Model based on Cross-modal
  Latent Representations
Facetron: A Multi-speaker Face-to-Speech Model based on Cross-modal Latent RepresentationsEuropean Signal Processing Conference (EUSIPCO), 2021
Seyun Um
Jihyun Kim
Jihyun Lee
Hong-Goo Kang
CVBM
282
4
0
26 Jul 2021
Interactive Storytelling for Children: A Case-study of Design and
  Development Considerations for Ethical Conversational AI
Interactive Storytelling for Children: A Case-study of Design and Development Considerations for Ethical Conversational AI
J. Chubb
S. Missaoui
S. Concannon
Liam Maloney
James Alfred Walker
138
43
0
20 Jul 2021
VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive
  Text-to-Speech Synthesis
VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis
Hui Lu
Zhiyong Wu
Xixin Wu
Xu Li
Shiyin Kang
Xunying Liu
Helen Meng
93
15
0
07 Jul 2021
AdaSpeech 3: Adaptive Text to Speech for Spontaneous Style
AdaSpeech 3: Adaptive Text to Speech for Spontaneous Style
Yuzi Yan
Xu Tan
Bohan Li
Guangyan Zhang
Tao Qin
Sheng Zhao
Yuan-Chung Shen
Weiqiang Zhang
Tie-Yan Liu
116
23
0
06 Jul 2021
EditSpeech: A Text Based Speech Editing System Using Partial Inference
  and Bidirectional Fusion
EditSpeech: A Text Based Speech Editing System Using Partial Inference and Bidirectional Fusion
Daxin Tan
Liqun Deng
Y. Yeung
Xin Jiang
Xiao Chen
Tan Lee
143
50
0
04 Jul 2021
A Survey on Neural Speech Synthesis
A Survey on Neural Speech Synthesis
Xu Tan
Tao Qin
Frank Soong
Tie-Yan Liu
AI4TS
287
427
0
29 Jun 2021
GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech
  Synthesis
GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech SynthesisInterspeech (Interspeech), 2021
Jinhyeok Yang
Jaesung Bae
Taejun Bak
Young-Ik Kim
Hoon-Young Cho
170
42
0
29 Jun 2021
Distilling the Knowledge from Conditional Normalizing Flows
Distilling the Knowledge from Conditional Normalizing Flows
Dmitry Baranchuk
Vladimir Aliev
Artem Babenko
BDL
180
4
0
24 Jun 2021
Improving Performance of Seen and Unseen Speech Style Transfer in
  End-to-end Neural TTS
Improving Performance of Seen and Unseen Speech Style Transfer in End-to-end Neural TTSInterspeech (Interspeech), 2021
Xiaochun An
Frank Soong
Lei Xie
278
9
0
18 Jun 2021
Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis
Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis
D. Mohan
Qinmin Hu
Tian Huey Teh
Alexandra Torresquintero
C. Wallis
Marlene Staib
Lorenzo Foglianti
Jiameng Gao
Simon King
125
20
0
15 Jun 2021
Conditional Variational Autoencoder with Adversarial Learning for
  End-to-End Text-to-Speech
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-SpeechInternational Conference on Machine Learning (ICML), 2021
Jaehyeon Kim
Jungil Kong
Juhee Son
DRL
240
1,124
0
11 Jun 2021
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech GenerationInternational Conference on Machine Learning (ICML), 2021
Dong Min
Dong Bok Lee
Eunho Yang
Sung Ju Hwang
282
206
0
06 Jun 2021
An objective evaluation of the effects of recording conditions and
  speaker characteristics in multi-speaker deep neural speech synthesis
An objective evaluation of the effects of recording conditions and speaker characteristics in multi-speaker deep neural speech synthesisInternational Conference on Knowledge-Based Intelligent Information & Engineering Systems (KES), 2021
Beáta Lőrincz
Adriana Stan
M. Giurgiu
66
2
0
03 Jun 2021
Speaker verification-derived loss and data augmentation for DNN-based
  multispeaker speech synthesis
Speaker verification-derived loss and data augmentation for DNN-based multispeaker speech synthesisEuropean Signal Processing Conference (EUSIPCO), 2021
Beáta Lőrincz
Adriana Stan
M. Giurgiu
78
6
0
03 Jun 2021
ItôTTS and ItôWave: Linear Stochastic Differential Equation Is All
  You Need For Audio Generation
ItôTTS and ItôWave: Linear Stochastic Differential Equation Is All You Need For Audio Generation
Shoule Wu
Ziqiang Shi
DiffM
206
11
0
17 May 2021
Interpreting intermediate convolutional layers of generative CNNs
  trained on waveforms
Interpreting intermediate convolutional layers of generative CNNs trained on waveformsIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2021
Gašper Beguš
Alan Zhou
240
8
0
19 Apr 2021
TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model
  for Speech Synthesis with Explicit Pitch and Duration Prediction
TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction
Stanislav Beliaev
Boris Ginsburg
169
10
0
16 Apr 2021
SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model
SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech ModelInterspeech (Interspeech), 2021
Edresson Casanova
C. Shulby
Eren Golge
Nicolas Müller
F. S. Oliveira
Arnaldo Cândido Júnior
A. S. Soares
S. Aluísio
M. Ponti
188
113
0
02 Apr 2021
Continual Speaker Adaptation for Text-to-Speech Synthesis
Continual Speaker Adaptation for Text-to-Speech Synthesis
Hamed Hemati
Damian Borth
CLL
154
9
0
26 Mar 2021
AdaSpeech: Adaptive Text to Speech for Custom Voice
AdaSpeech: Adaptive Text to Speech for Custom VoiceInternational Conference on Learning Representations (ICLR), 2021
Mingjian Chen
Xu Tan
Bohan Li
Yanqing Liu
Tao Qin
Sheng Zhao
Tie-Yan Liu
VLMDiffM
198
211
0
01 Mar 2021
Deepfakes Generation and Detection: State-of-the-art, open challenges,
  countermeasures, and way forward
Deepfakes Generation and Detection: State-of-the-art, open challenges, countermeasures, and way forward
Momina Masood
M. Nawaz
K. Malik
A. Javed
Aun Irtaza
AAML
465
397
0
25 Feb 2021
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep
  VAE with Residual Attention
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention
Peng Liu
Yuewen Cao
Songxiang Liu
Na Hu
Guangzhi Li
Chao Weng
Jane Polak Scowcroft
149
23
0
12 Feb 2021
Voice Cloning: a Multi-Speaker Text-to-Speech Synthesis Approach based
  on Transfer Learning
Voice Cloning: a Multi-Speaker Text-to-Speech Synthesis Approach based on Transfer Learning
Giuseppe Ruggiero
Enrico Zovato
Luigi Di Caro
V. Pollet
DiffM
104
14
0
10 Feb 2021
Expressive Neural Voice Cloning
Expressive Neural Voice CloningAsian Conference on Machine Learning (ACML), 2021
Paarth Neekhara
Shehzeen Samarah Hussain
Shlomo Dubnov
F. Koushanfar
Julian McAuley
DiffM
121
36
0
30 Jan 2021
Whispered and Lombard Neural Speech Synthesis
Whispered and Lombard Neural Speech SynthesisSpoken Language Technology Workshop (SLT), 2021
Qiong Hu
T. Bleisch
Petko N. Petkov
T. Raitio
Erik Marchi
V. Lakshminarasimhan
128
15
0
13 Jan 2021
Few Shot Adaptive Normalization Driven Multi-Speaker Speech Synthesis
Few Shot Adaptive Normalization Driven Multi-Speaker Speech Synthesis
Neeraj Kumar
Srishti Goel
Ankur Narang
Brejesh Lall
113
5
0
14 Dec 2020
EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture
EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture
Chenfeng Miao
Shuang Liang
Zhencheng Liu
Minchuan Chen
Jun Ma
Shaojun Wang
Jing Xiao
143
43
0
07 Dec 2020
MelGlow: Efficient Waveform Generative Network Based on
  Location-Variable Convolution
MelGlow: Efficient Waveform Generative Network Based on Location-Variable ConvolutionSpoken Language Technology Workshop (SLT), 2020
Zhen Zeng
Jianzong Wang
Ning Cheng
Jing Xiao
129
8
0
03 Dec 2020
Synth2Aug: Cross-domain speaker recognition with TTS synthesized speech
Synth2Aug: Cross-domain speaker recognition with TTS synthesized speechSpoken Language Technology Workshop (SLT), 2020
Yiling Huang
Yutian Chen
Jason W. Pelecanos
Quan Wang
144
13
0
24 Nov 2020
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Isaac Elias
Heiga Zen
Jonathan Shen
Yu Zhang
Ye Jia
Ron J. Weiss
Yonghui Wu
DRL
157
109
0
22 Oct 2020
Learning Speaker Embedding from Text-to-Speech
Learning Speaker Embedding from Text-to-Speech
Jaejin Cho
Piotr Żelasko
Jesus Villalba
Shinji Watanabe
Najim Dehak
108
12
0
21 Oct 2020
Neural Speech Synthesis for Estonian
Neural Speech Synthesis for Estonian
Liisa Rätsep
Liisi Piits
Hille Pajupuu
Indrek Hein
Mark Fišel
51
2
0
06 Oct 2020
HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis
HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis
Jiawei Chen
Xu Tan
Jian Luan
Tao Qin
Tie-Yan Liu
VLM
188
104
0
03 Sep 2020
Prosody Learning Mechanism for Speech Synthesis System Without Text
  Length Limit
Prosody Learning Mechanism for Speech Synthesis System Without Text Length LimitInterspeech (Interspeech), 2020
Zhen Zeng
Jianzong Wang
Ning Cheng
Jing Xiao
133
9
0
13 Aug 2020
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
LRSpeech: Extremely Low-Resource Speech Synthesis and RecognitionKnowledge Discovery and Data Mining (KDD), 2020
Jin Xu
Xu Tan
Yi Ren
Tao Qin
Jian Li
Sheng Zhao
Tie-Yan Liu
VLM
129
98
0
09 Aug 2020
An Overview of Voice Conversion and its Challenges: From Statistical
  Modeling to Deep Learning
An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep LearningIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2020
Berrak Sisman
Junichi Yamagishi
Simon King
Haizhou Li
BDL
391
388
0
09 Aug 2020
Previous
1234
Next