Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.05884
Cited By
v1
v2 (latest)
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
16 December 2017
Jonathan Shen
Ruoming Pang
Ron J. Weiss
M. Schuster
Navdeep Jaitly
Zongheng Yang
Zhiwen Chen
Yu Zhang
Yuxuan Wang
RJ Skerry-Ryan
Rif A. Saurous
Yannis Agiomyrgiannakis
Yonghui Wu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions"
50 / 1,276 papers shown
Title
Speech Synthesis as Augmentation for Low-Resource ASR
Deblin Bagchi
Shannon Wotherspoon
Zhuolin Jiang
P. Muthukumar
27
2
0
23 Dec 2020
Incremental Text-to-Speech Synthesis Using Pseudo Lookahead with Large Pretrained Language Model
Takaaki Saeki
Shinnosuke Takamichi
Hiroshi Saruwatari
55
16
0
23 Dec 2020
Parallel WaveNet conditioned on VAE latent vectors
Jonas Rohnke
Thomas Merritt
Jaime Lorenzo-Trueba
Adam Gabry's
Vatsal Aggarwal
Alexis Moinet
Roberto Barra-Chicote
74
3
0
17 Dec 2020
Few Shot Adaptive Normalization Driven Multi-Speaker Speech Synthesis
Neeraj Kumar
Srishti Goel
Ankur Narang
Brejesh Lall
68
5
0
14 Dec 2020
Syntactic representation learning for neural network based TTS with syntactic parse tree traversal
Changhe Song
Jingbei Li
Yixuan Zhou
Zhiyong Wu
Helen Meng
49
6
0
13 Dec 2020
DeepTalk: Vocal Style Encoding for Speaker Recognition and Speech Synthesis
Anurag Chowdhury
Arun Ross
Prabu David
38
5
0
09 Dec 2020
Using previous acoustic context to improve Text-to-Speech synthesis
Pilar Oplustil Gallegos
Simon King
70
11
0
07 Dec 2020
EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture
Chenfeng Miao
Shuang Liang
Zhencheng Liu
Minchuan Chen
Jun Ma
Shaojun Wang
Jing Xiao
67
38
0
07 Dec 2020
Text-to-speech for the hearing impaired
Josef Schlittenlacher
T. Baer
32
0
0
03 Dec 2020
MelGlow: Efficient Waveform Generative Network Based on Location-Variable Convolution
Zhen Zeng
Jianzong Wang
Ning Cheng
Jing Xiao
44
8
0
03 Dec 2020
GraphPB: Graphical Representations of Prosody Boundary in Speech Synthesis
Aolan Sun
Jianzong Wang
Ning Cheng
Huayi Peng
Zhen Zeng
Lingwei Kong
Jing Xiao
62
9
0
03 Dec 2020
FBWave: Efficient and Scalable Neural Vocoders for Streaming Text-To-Speech on the Edge
Bichen Wu
Qing He
Peizhao Zhang
T. Koehler
Kurt Keutzer
Peter Vajda
47
6
0
25 Nov 2020
Synth2Aug: Cross-domain speaker recognition with TTS synthesized speech
Yiling Huang
Yutian Chen
Jason W. Pelecanos
Quan Wang
100
12
0
24 Nov 2020
Exploring Voice Conversion based Data Augmentation in Text-Dependent Speaker Verification
Xiaoyi Qin
Yaogen Yang
Lin Yang
Xuyang Wang
Junjie Wang
Ming Li
49
0
0
21 Nov 2020
Empirical Evaluation of Deep Learning Model Compression Techniques on the WaveNet Vocoder
Sam Davis
Giuseppe Coccia
Sam Gooch
Julian Mack
36
0
0
20 Nov 2020
DeepRepair: Style-Guided Repairing for DNNs in the Real-world Operational Environment
Bing Yu
Hua Qi
Qing Guo
Felix Juefei Xu
Xiaofei Xie
Lei Ma
Jianjun Zhao
25
5
0
19 Nov 2020
Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains
Won Jang
D. Lim
Jaesam Yoon
60
34
0
19 Nov 2020
Controllable Emotion Transfer For End-to-End Speech Synthesis
Tao Li
Shan Yang
Liumeng Xue
Lei Xie
79
74
0
17 Nov 2020
s-Transformer: Segment-Transformer for Robust Neural Speech Synthesis
Xi Wang
Huaiping Ming
Lei He
Frank Soong
43
5
0
17 Nov 2020
Fine-grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis
Yinjiao Lei
Shan Yang
Lei Xie
88
56
0
17 Nov 2020
Learn2Sing: Target Speaker Singing Voice Synthesis by learning from a Singing Teacher
Heyang Xue
Shan Yang
Yinjiao Lei
Lei Xie
Xiulin Li
45
11
0
17 Nov 2020
Speech Prediction in Silent Videos using Variational Autoencoders
Ravindra Yadav
Ashish Sardana
Vinay P. Namboodiri
R. Hegde
VGen
DRL
63
23
0
14 Nov 2020
Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis
C. Chien
Hung-yi Lee
91
36
0
12 Nov 2020
Using IPA-Based Tacotron for Data Efficient Cross-Lingual Speaker Adaptation and Pronunciation Enhancement
Hamed Hemati
Damian Borth
72
9
0
12 Nov 2020
Enhancing Low-Quality Voice Recordings Using Disentangled Channel Factor and Neural Waveform Model
Haoyu Li
Yang Ai
Junichi Yamagishi
76
2
0
10 Nov 2020
Pretraining Strategies, Waveform Model Choice, and Acoustic Configurations for Multi-Speaker End-to-End Speech Synthesis
Erica Cooper
Xin Wang
Yi Zhao
Yusuke Yasuda
Junichi Yamagishi
SyDa
50
3
0
10 Nov 2020
Denoising-and-Dereverberation Hierarchical Neural Vocoder for Robust Waveform Generation
Yang Ai
Haoyu Li
Xin Wang
Junichi Yamagishi
Zhenhua Ling
47
4
0
08 Nov 2020
Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement
Daxin Tan
Tan Lee
116
21
0
08 Nov 2020
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Ron J. Weiss
RJ Skerry-Ryan
Eric Battenberg
Soroosh Mariooryad
Diederik P. Kingma
99
101
0
06 Nov 2020
Large-scale multilingual audio visual dubbing
Yi Yang
Brendan Shillingford
Yannis Assael
Miaosen Wang
Wendi Liu
...
Eren Sezener
Luis C. Cobo
Misha Denil
Y. Aytar
Nando de Freitas
70
21
0
06 Nov 2020
Improving Prosody Modelling with Cross-Utterance BERT Embeddings for End-to-end Speech Synthesis
Guanghui Xu
Wei Song
Zhengchen Zhang
Chao Zhang
Xiaodong He
Bowen Zhou
62
50
0
06 Nov 2020
Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech
S. Karlapati
Ammar Abbas
Zack Hodari
Alexis Moinet
Arnaud Joly
Panagiota Karanasou
Thomas Drugman
66
19
0
04 Nov 2020
Learning to Maximize Speech Quality Directly Using MOS Prediction for Neural Text-to-Speech
Yeunju Choi
Youngmoon Jung
Youngjoo Suh
Hoirin Kim
129
6
0
02 Nov 2020
FeatherTTS: Robust and Efficient attention based Neural TTS
Qiao Tian
Zewang Zhang
Chao-Jung Liu
Heng Lu
Linghui Chen
Bin Wei
P. He
Shan Liu
69
4
0
02 Nov 2020
PPG-based singing voice conversion with adversarial representation learning
Zhonghao Li
Benlai Tang
Xiang Yin
Yuan Wan
Linjia Xu
Chen Shen
Zejun Ma
59
37
0
28 Oct 2020
FragmentVC: Any-to-Any Voice Conversion by End-to-End Extracting and Fusing Fine-Grained Voice Fragments With Attention
Yist Y. Lin
C. Chien
Jheng-hao Lin
Hung-yi Lee
Lin-Shan Lee
60
79
0
27 Oct 2020
Speaker Anonymization with Distribution-Preserving X-Vector Generation for the VoicePrivacy Challenge 2020
H.C.M. Turner
Giulio Lovisotto
Ivan Martinovic
73
21
0
26 Oct 2020
TTS-by-TTS: TTS-driven Data Augmentation for Fast and High-Quality Speech Synthesis
Min-Jae Hwang
Ryuichi Yamamoto
Eunwoo Song
Jae-Min Kim
44
32
0
26 Oct 2020
Emotion controllable speech synthesis using emotion-unlabeled dataset with the assistance of cross-domain speech emotion recognition
Xiong Cai
Dongyang Dai
Zhiyong Wu
Xiang Li
Jingbei Li
Helen Meng
96
67
0
26 Oct 2020
GraphSpeech: Syntax-Aware Graph Attention Network For Neural Speech Synthesis
Rui Liu
Berrak Sisman
Haizhou Li
96
25
0
23 Oct 2020
Show and Speak: Directly Synthesize Spoken Description of Images
Xinsheng Wang
Siyuan Feng
Jihua Zhu
M. Hasegawa-Johnson
O. Scharenborg
152
4
0
23 Oct 2020
AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines
Yao Shi
Hui Bu
Xin Xu
Shaojing Zhang
Ming Li
112
223
0
22 Oct 2020
The NTU-AISG Text-to-speech System for Blizzard Challenge 2020
Haobo Zhang
Tingzhi Mao
Haihua Xu
Hao-Ming Huang
76
1
0
22 Oct 2020
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Isaac Elias
Heiga Zen
Jonathan Shen
Yu Zhang
Ye Jia
Ron J. Weiss
Yonghui Wu
DRL
76
103
0
22 Oct 2020
Learning Speaker Embedding from Text-to-Speech
Jaejin Cho
Piotr Żelasko
Jesus Villalba
Shinji Watanabe
Najim Dehak
66
11
0
21 Oct 2020
An Investigation of the Relation Between Grapheme Embeddings and Pronunciation for Tacotron-based Systems
Antoine Perquin
Erica Cooper
Junichi Yamagishi
27
1
0
21 Oct 2020
Fluent and Low-latency Simultaneous Speech-to-Speech Translation with Self-adaptive Training
Renjie Zheng
Mingbo Ma
Baigong Zheng
Kaibo Liu
Jiahong Yuan
Kenneth Church
Liang Huang
76
14
0
20 Oct 2020
End-to-End Text-to-Speech using Latent Duration based on VQ-VAE
Yusuke Yasuda
Xin Wang
Junichi Yamagishi
66
17
0
19 Oct 2020
Towards Natural Bilingual and Code-Switched Speech Synthesis Based on Mix of Monolingual Recordings and Cross-Lingual Voice Conversion
Shengkui Zhao
Trung Hieu Nguyen
Hao Wang
B. Ma
60
25
0
16 Oct 2020
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
183
1,954
0
12 Oct 2020
Previous
1
2
3
...
18
19
20
...
24
25
26
Next