ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1710.07654
  4. Cited By
Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence
  Learning

Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning

20 October 2017
Wei Ping
Kainan Peng
Andrew Gibiansky
Sercan Ö. Arik
Ajay Kannan
Sharan Narang
Jonathan Raiman
John Miller
ArXivPDFHTML

Papers citing "Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning"

50 / 57 papers shown
Title
DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis
DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis
Zeeshan Ahmad
Shudi Bao
Meng Chen
18
0
0
14 May 2025
Style Description based Text-to-Speech with Conditional Prosodic Layer
  Normalization based Diffusion GAN
Style Description based Text-to-Speech with Conditional Prosodic Layer Normalization based Diffusion GAN
Neeraj Kumar
Ankur Narang
Brejesh Lall
DiffM
21
0
0
27 Oct 2023
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph
  Reading
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading
Yujia Xiao
Shaofei Zhang
Xi Wang
Xuejiao Tan
Lei He
Sheng Zhao
Frank Soong
Tan Lee
17
5
0
03 Jul 2023
Transformers in Speech Processing: A Survey
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Junaid Qadir
42
47
0
21 Mar 2023
Towards Building Text-To-Speech Systems for the Next Billion Users
Towards Building Text-To-Speech Systems for the Next Billion Users
Gokul Karthik Kumar
V. PraveenS.
Pratyush Kumar
Mitesh M. Khapra
Karthik Nandakumar
36
18
0
17 Nov 2022
Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New
  Speakers
Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers
Cheng-Ping Hsieh
Subhankar Ghosh
Boris Ginsburg
41
18
0
01 Nov 2022
Text-to-speech synthesis from dark data with evaluation-in-the-loop data
  selection
Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection
Kentaro Seki
Shinnosuke Takamichi
Takaaki Saeki
Hiroshi Saruwatari
19
6
0
26 Oct 2022
The Sound of Silence: Efficiency of First Digit Features in Synthetic
  Audio Detection
The Sound of Silence: Efficiency of First Digit Features in Synthetic Audio Detection
Daniele Mari
Federica Latora
Simone Milani
11
11
0
06 Oct 2022
Controllable Data Generation by Deep Learning: A Review
Controllable Data Generation by Deep Learning: A Review
Shiyu Wang
Yuanqi Du
Xiaojie Guo
Bo Pan
Zhaohui Qin
Liang Zhao
29
28
0
19 Jul 2022
Show Me Your Face, And I'll Tell You How You Speak
Show Me Your Face, And I'll Tell You How You Speak
Christen Millerdurai
L. A. Khaliq
Timon Ulrich
CVBM
60
0
0
28 Jun 2022
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level
  Quality
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
Xu Tan
Jiawei Chen
Haohe Liu
Jian Cong
Chen Zhang
...
Lei He
Frank Soong
Tao Qin
Sheng Zhao
Tie-Yan Liu
38
211
0
09 May 2022
Heterogeneous Target Speech Separation
Heterogeneous Target Speech Separation
Hyunjae Cho
Wonbin Jung
Junhyeok Lee
Paris Smaragdis
Sanghyun Woo
46
26
0
07 Apr 2022
Residual-guided Personalized Speech Synthesis based on Face Image
Residual-guided Personalized Speech Synthesis based on Face Image
Jianrong Wang
Zixuan Wang
Xiaosheng Hu
Xuewei Li
Qiang Fang
Li Liu
CVBM
17
16
0
01 Apr 2022
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios
Yihan Wu
Xu Tan
Bohan Li
Lei He
Sheng Zhao
Ruihua Song
Tao Qin
Tie-Yan Liu
VLM
DiffM
14
66
0
01 Apr 2022
Real time spectrogram inversion on mobile phone
Real time spectrogram inversion on mobile phone
Oleg Rybakov
Marco Tagliasacchi
Yunpeng Li
Liyang Jiang
Xia Zhang
Fadi Biadsy
13
4
0
01 Mar 2022
Synthesizing Dysarthric Speech Using Multi-talker TTS for Dysarthric
  Speech Recognition
Synthesizing Dysarthric Speech Using Multi-talker TTS for Dysarthric Speech Recognition
M. Soleymanpour
Michael T. Johnson
Rahim Soleymanpour
J. Berry
27
27
0
27 Jan 2022
Disentangling Style and Speaker Attributes for TTS Style Transfer
Disentangling Style and Speaker Attributes for TTS Style Transfer
Xiaochun An
Frank Soong
Lei Xie
54
18
0
24 Jan 2022
Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech
Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech
Sung-Feng Huang
Chyi-Jiunn Lin
Da-Rong Liu
Yi-Chen Chen
Hung-yi Lee
8
56
0
07 Nov 2021
Emotional Prosody Control for Speech Generation
Emotional Prosody Control for Speech Generation
S. Sivaprasad
Saiteja Kosgi
Vineet Gandhi
8
17
0
07 Nov 2021
Neural Dubber: Dubbing for Videos According to Scripts
Neural Dubber: Dubbing for Videos According to Scripts
Chenxu Hu
Qiao Tian
Tingle Li
Yuping Wang
Yuxuan Wang
Hang Zhao
DiffM
VGen
36
39
0
15 Oct 2021
GANtron: Emotional Speech Synthesis with Generative Adversarial Networks
GANtron: Emotional Speech Synthesis with Generative Adversarial Networks
E. Hortal
Rodrigo Brechard Alarcia
GAN
21
2
0
06 Oct 2021
EditSpeech: A Text Based Speech Editing System Using Partial Inference
  and Bidirectional Fusion
EditSpeech: A Text Based Speech Editing System Using Partial Inference and Bidirectional Fusion
Daxin Tan
Liqun Deng
Y. Yeung
Xin Jiang
Xiao Chen
Tan Lee
26
37
0
04 Jul 2021
A Survey on Neural Speech Synthesis
A Survey on Neural Speech Synthesis
Xu Tan
Tao Qin
Frank Soong
Tie-Yan Liu
AI4TS
18
352
0
29 Jun 2021
GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech
  Synthesis
GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis
Jinhyeok Yang
Jaesung Bae
Taejun Bak
Young-Ik Kim
Hoon-Young Cho
23
36
0
29 Jun 2021
Improving Performance of Seen and Unseen Speech Style Transfer in
  End-to-end Neural TTS
Improving Performance of Seen and Unseen Speech Style Transfer in End-to-end Neural TTS
Xiaochun An
Frank Soong
Lei Xie
31
9
0
18 Jun 2021
Conditional Variational Autoencoder with Adversarial Learning for
  End-to-End Text-to-Speech
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Jaehyeon Kim
Jungil Kong
Juhee Son
DRL
39
840
0
11 Jun 2021
AdaSpeech: Adaptive Text to Speech for Custom Voice
AdaSpeech: Adaptive Text to Speech for Custom Voice
Mingjian Chen
Xu Tan
Bohan Li
Yanqing Liu
Tao Qin
Sheng Zhao
Tie-Yan Liu
VLM
DiffM
20
186
0
01 Mar 2021
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep
  VAE with Residual Attention
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention
Peng Liu
Yuewen Cao
Songxiang Liu
Na Hu
Guangzhi Li
Chao Weng
Dan Su
31
22
0
12 Feb 2021
Few Shot Adaptive Normalization Driven Multi-Speaker Speech Synthesis
Few Shot Adaptive Normalization Driven Multi-Speaker Speech Synthesis
Neeraj Kumar
Srishti Goel
Ankur Narang
Brejesh Lall
13
5
0
14 Dec 2020
Synth2Aug: Cross-domain speaker recognition with TTS synthesized speech
Synth2Aug: Cross-domain speaker recognition with TTS synthesized speech
Yiling Huang
Yutian Chen
Jason W. Pelecanos
Quan Wang
12
11
0
24 Nov 2020
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Isaac Elias
Heiga Zen
Jonathan Shen
Yu Zhang
Ye Jia
Ron J. Weiss
Yonghui Wu
DRL
17
102
0
22 Oct 2020
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
Jin Xu
Xu Tan
Yi Ren
Tao Qin
Jian Li
Sheng Zhao
Tie-Yan Liu
VLM
16
90
0
09 Aug 2020
An Overview of Voice Conversion and its Challenges: From Statistical
  Modeling to Deep Learning
An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning
Berrak Sisman
Junichi Yamagishi
Simon King
Haizhou Li
BDL
27
316
0
09 Aug 2020
Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model
  with Pitch-dependent Dilated Convolution Neural Network
Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model with Pitch-dependent Dilated Convolution Neural Network
Yi-Chiao Wu
Tomoki Hayashi
Patrick Lumban Tobing
Kazuhiro Kobayashi
T. Toda
13
18
0
11 Jul 2020
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
Yi Ren
Chenxu Hu
Xu Tan
Tao Qin
Sheng Zhao
Zhou Zhao
Tie-Yan Liu
45
1,354
0
08 Jun 2020
Universal Adversarial Perturbations: A Survey
Universal Adversarial Perturbations: A Survey
Ashutosh Chaubey
Nikhil Agrawal
Kavya Barnwal
K. K. Guliani
Pramod Mehta
OOD
AAML
29
46
0
16 May 2020
Flowtron: an Autoregressive Flow-based Generative Network for
  Text-to-Speech Synthesis
Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis
Rafael Valle
Kevin J. Shih
R. Prenger
Bryan Catanzaro
13
119
0
12 May 2020
Direct Speech-to-image Translation
Direct Speech-to-image Translation
Jiguo Li
Xinfeng Zhang
Chuanmin Jia
Jizheng Xu
Li Zhang
Y. Wang
Siwei Ma
Wen Gao
28
29
0
07 Apr 2020
Vocoder-Based Speech Synthesis from Silent Videos
Vocoder-Based Speech Synthesis from Silent Videos
Daniel Michelsanti
Olga Slizovskaia
G. Haro
Emilia Gómez
Z. Tan
Jesper Jensen
27
31
0
06 Apr 2020
DeepFake Detection: Current Challenges and Next Steps
DeepFake Detection: Current Challenges and Next Steps
Siwei Lyu
52
158
0
11 Mar 2020
AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit
  Alignment
AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment
Zhen Zeng
Jianzong Wang
Ning Cheng
Tian Xia
Jing Xiao
VLM
25
56
0
04 Mar 2020
Generating diverse and natural text-to-speech samples using a quantized
  fine-grained VAE and auto-regressive prosody prior
Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior
Guangzhi Sun
Yu Zhang
Ron J. Weiss
Yuan Cao
Heiga Zen
Andrew Rosenberg
Bhuvana Ramabhadran
Yonghui Wu
DiffM
14
92
0
06 Feb 2020
Generating Synthetic Audio Data for Attention-Based Speech Recognition
  Systems
Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems
Nick Rossenbach
Albert Zeyer
Ralf Schluter
Hermann Ney
8
83
0
19 Dec 2019
Vision-Infused Deep Audio Inpainting
Vision-Infused Deep Audio Inpainting
Hang Zhou
Ziwei Liu
Lingfeng Guo
Ping Luo
Dahua Lin
27
88
0
24 Oct 2019
High Fidelity Speech Synthesis with Adversarial Networks
High Fidelity Speech Synthesis with Adversarial Networks
Mikolaj Binkowski
Jeff Donahue
Sander Dieleman
Aidan Clark
Erich Elsen
Norman Casagrande
Luis C. Cobo
Karen Simonyan
223
239
0
25 Sep 2019
Unpaired Image-to-Speech Synthesis with Multimodal Information
  Bottleneck
Unpaired Image-to-Speech Synthesis with Multimodal Information Bottleneck
Shuang Ma
Daniel J. McDuff
Yale Song
20
22
0
19 Aug 2019
CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven
  Dynamic Hierarchical Conditional Variational Network
CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network
V. Wan
Chun-an Chan
Tom Kenter
Jakub Vít
R. Clark
13
75
0
17 May 2019
Almost Unsupervised Text to Speech and Automatic Speech Recognition
Almost Unsupervised Text to Speech and Automatic Speech Recognition
Yi Ren
Xu Tan
Tao Qin
Sheng Zhao
Zhou Zhao
Tie-Yan Liu
38
101
0
13 May 2019
Probability density distillation with generative adversarial networks
  for high-quality parallel waveform generation
Probability density distillation with generative adversarial networks for high-quality parallel waveform generation
Ryuichi Yamamoto
Eunwoo Song
Jae-Min Kim
11
55
0
09 Apr 2019
FPETS : Fully Parallel End-to-End Text-to-Speech System
FPETS : Fully Parallel End-to-End Text-to-Speech System
Dabiao Ma
Zhiba Su
Wenxuan Wang
Yuhao Lu
19
6
0
12 Dec 2018
12
Next