Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.05884
Cited By
v1
v2 (latest)
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
16 December 2017
Jonathan Shen
Ruoming Pang
Ron J. Weiss
M. Schuster
Navdeep Jaitly
Zongheng Yang
Zhiwen Chen
Yu Zhang
Yuxuan Wang
RJ Skerry-Ryan
Rif A. Saurous
Yannis Agiomyrgiannakis
Yonghui Wu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions"
50 / 1,276 papers shown
Title
Synthesizing Speech from Intracranial Depth Electrodes using an Encoder-Decoder Framework
Jonas Köhler
Maarten C. Ottenhoff
Sophocles Goulis
Miguel Angrick
A. Colon
Louis Wagner
S. Tousseyn
P. Kubben
Christian Herff
52
28
0
02 Nov 2021
RefineGAN: Universally Generating Waveform Better than Ground Truth with Highly Accurate Pitch and Intensity Responses
Shengyuan Xu
Wenxiao Zhao
Jing Guo
63
12
0
01 Nov 2021
VRAIN-UPV MLLP's system for the Blizzard Challenge 2021
A. P. D. Martos
Albert Sanchis
Alfons Juan-Císcar
114
6
0
29 Oct 2021
TorchAudio: Building Blocks for Audio and Speech Processing
Yao-Yuan Yang
Moto Hira
Zhaoheng Ni
Anjali Chourdia
Artyom Astafurov
...
Mehrzad Samadi
Shinji Watanabe
Soumith Chintala
Vincent Quenneville-Bélair
Yangyang Shi
106
170
0
28 Oct 2021
Assessing Evaluation Metrics for Speech-to-Speech Translation
Elizabeth Salesky
Julian Mäder
Severin Klinger
74
15
0
26 Oct 2021
Beyond
L
p
L_p
L
p
clipping: Equalization-based Psychoacoustic Attacks against ASRs
H. Abdullah
Muhammad Sajidur Rahman
Christian Peeters
Cassidy Gibson
Washington Garcia
Vincent Bindschaedler
T. Shrimpton
Patrick Traynor
AAML
48
10
0
25 Oct 2021
DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021
Yanqing Liu
Rui Shao
G. Wang
Kuan Chen
Bohan Li
Pong C. Yuen
Jinzhu Li
Lei He
Sheng Zhao
91
55
0
25 Oct 2021
Discrete Acoustic Space for an Efficient Sampling in Neural Text-To-Speech
Mu Li
Jonas Rohnke
Antonio Bonafonte
Mateusz Lajszczak
Trevor Wood
DRL
100
2
0
24 Oct 2021
Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition
Ting-Yao Hu
Mohammadreza Armandpour
A. Shrivastava
Jen-Hao Rick Chang
H. Koppula
Oncel Tuzel
SyDa
87
42
0
21 Oct 2021
Speech Pattern based Black-box Model Watermarking for Automatic Speech Recognition
Haozhe Chen
Weiming Zhang
Kunlin Liu
Kejiang Chen
Han Fang
Nenghai Yu
37
4
0
19 Oct 2021
Improving Emotional Speech Synthesis by Using SUS-Constrained VAE and Text Encoder Aggregation
Fengyu Yang
Jian Luan
Yujun Wang
137
5
0
19 Oct 2021
Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge
Mutian He
Jingzhou Yang
Lei He
Frank Soong
86
1
0
19 Oct 2021
Intelligent Video Editing: Incorporating Modern Talking Face Generation Algorithms in a Video Editor
Anchit Gupta
Faizan Farooq Khan
Rudrabha Mukhopadhyay
Vinay P. Namboodiri
C. V. Jawahar
CVBM
78
6
0
16 Oct 2021
Neural Dubber: Dubbing for Videos According to Scripts
Chenxu Hu
Qiao Tian
Tingle Li
Yuping Wang
Yuxuan Wang
Hang Zhao
DiffM
VGen
99
43
0
15 Oct 2021
From Start to Finish: Latency Reduction Strategies for Incremental Speech Synthesis in Simultaneous Speech-to-Speech Translation
Danni Liu
Changhan Wang
Hongyu Gong
Xutai Ma
Yun Tang
J. Pino
98
4
0
15 Oct 2021
ESPnet2-TTS: Extending the Edge of TTS Research
Tomoki Hayashi
Ryuichi Yamamoto
Takenori Yoshimura
Peter Wu
Jiatong Shi
Takaaki Saeki
Yooncheol Ju
Yusuke Yasuda
Shinnosuke Takamichi
Shinji Watanabe
VLM
85
63
0
15 Oct 2021
SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation
Rongjie Huang
Chenye Cui
Feiyang Chen
Yi Ren
Jinglin Liu
Zhou Zhao
Baoxing Huai
N. Yuan
GAN
203
63
0
14 Oct 2021
FedSpeech: Federated Text-to-Speech with Continual Learning
Ziyue Jiang
Yi Ren
Ming Lei
Zhou Zhao
FedML
166
28
0
14 Oct 2021
Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data
Haitong Zhang
Yue Lin
56
0
0
14 Oct 2021
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing
Junyi Ao
Rui Wang
Long Zhou
Chengyi Wang
Shuo Ren
...
Yu Zhang
Zhihua Wei
Yao Qian
Jinyu Li
Furu Wei
168
203
0
14 Oct 2021
Exploring Timbre Disentanglement in Non-Autoregressive Cross-Lingual Text-to-Speech
Haoyue Zhan
Xinyuan Yu
Haitong Zhang
Yang Zhang
Yue Lin
50
5
0
14 Oct 2021
Revisiting IPA-based Cross-lingual Text-to-speech
Haitong Zhang
Haoyue Zhan
Yang Zhang
Xinyuan Yu
Yue Lin
61
7
0
14 Oct 2021
A Melody-Unsupervision Model for Singing Voice Synthesis
Soonbeom Choi
Juhan Nam
67
14
0
13 Oct 2021
Fine-grained style control in Transformer-based Text-to-speech Synthesis
Li-Wei Chen
Alexander I. Rudnicky
169
31
0
12 Oct 2021
S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech Representations
Wen-Chin Huang
Shu-Wen Yang
Tomoki Hayashi
Hung-yi Lee
Shinji Watanabe
Tomoki Toda
71
40
0
12 Oct 2021
Adapting TTS models For New Speakers using Transfer Learning
Paarth Neekhara
Jason Chun Lok Li
Boris Ginsburg
144
15
0
12 Oct 2021
Complex Network-Based Approach for Feature Extraction and Classification of Musical Genres
M. Pimenta-Zanon
G. Bressan
Fabricio M. Lopes
27
1
0
09 Oct 2021
PAMA-TTS: Progression-Aware Monotonic Attention for Stable Seq2Seq TTS With Accurate Phoneme Duration Control
Yunchao He
Jian Luan
Yujun Wang
112
1
0
09 Oct 2021
Towards Lifelong Learning of Multilingual Text-To-Speech Synthesis
Mu Yang
Shaojin Ding
Tianlong Chen
Tong Wang
Zhangyang Wang
CLL
73
5
0
09 Oct 2021
Using multiple reference audios and style embedding constraints for speech synthesis
Cheng Gong
Longbiao Wang
Zhenhua Ling
Ju Zhang
Jianwu Dang
48
5
0
09 Oct 2021
Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech
Pengfei Wu
Junjie Pan
Chenchang Xu
Junhui Zhang
Lin Wu
Xiang Yin
Zejun Ma
72
16
0
08 Oct 2021
KaraSinger: Score-Free Singing Voice Synthesis with VQ-VAE using Mel-spectrograms
Chien-Feng Liao
Jen-Yu Liu
Yi-Hsuan Yang
66
5
0
08 Oct 2021
Environment Aware Text-to-Speech Synthesis
Daxin Tan
Guangyan Zhang
Tan Lee
74
4
0
08 Oct 2021
A study on the efficacy of model pre-training in developing neural text-to-speech system
Guangyan Zhang
Yichong Leng
Daxin Tan
Ying Qin
Kaitao Song
Xu Tan
Sheng Zhao
Tan Lee
58
2
0
08 Oct 2021
Voice Reenactment with F0 and timing constraints and adversarial learning of conversions
F. Bous
L. Benaroya
Nicolas Obin
Axel Roebel
52
2
0
07 Oct 2021
Cloning one's voice using very limited data in the wild
Dongyang Dai
Yuan-Jui Chen
Li Chen
Ming Tu
Lu Liu
Rui Xia
Qiao Tian
Yuping Wang
Yuxuan Wang
SyDa
61
9
0
07 Oct 2021
VisualTTS: TTS with Accurate Lip-Speech Synchronization for Automatic Voice Over
Junchen Lu
Berrak Sisman
Rui Liu
Mingyang Zhang
Haizhou Li
DiffM
91
20
0
07 Oct 2021
Towards Universal Neural Vocoding with a Multi-band Excited WaveNet
Axel Roebel
F. Bous
56
2
0
07 Oct 2021
Automated Testing of AI Models
Swagatam Haldar
Deepak Vijaykeerthy
Diptikalyan Saha
VLM
44
0
0
07 Oct 2021
Emphasis control for parallel neural TTS
Shreyas Seshadri
T. Raitio
D. Castellani
Jiangchuan Li
120
11
0
06 Oct 2021
Hierarchical prosody modeling and control in non-autoregressive parallel neural TTS
T. Raitio
Jiangchuan Li
Shreyas Seshadri
78
23
0
06 Oct 2021
Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models
Jen-Hao Rick Chang
A. Shrivastava
H. Koppula
Xiaoshuai Zhang
Oncel Tuzel
DiffM
111
16
0
06 Oct 2021
An Investigation of the Effectiveness of Phase for Audio Classification
Shunsuke Hidaka
Kohei Wakamiya
T. Kaburagi
28
4
0
06 Oct 2021
GANtron: Emotional Speech Synthesis with Generative Adversarial Networks
E. Hortal
Rodrigo Brechard Alarcia
GAN
46
2
0
06 Oct 2021
Decoupling Speaker-Independent Emotions for Voice Conversion Via Source-Filter Networks
Zhaojie Luo
Shoufeng Lin
Rui Liu
Jun Baba
Yuichiro Yoshikawa
H. Ishiguro
47
9
0
04 Oct 2021
On the Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis
Cheng-I Jeff Lai
Erica Cooper
Yang Zhang
Shiyu Chang
Kaizhi Qian
...
Yung-Sung Chuang
Alexander H. Liu
Junichi Yamagishi
David D. Cox
James R. Glass
69
6
0
04 Oct 2021
PortaSpeech: Portable and High-Quality Generative Text-to-Speech
Yi Ren
Jinglin Liu
Zhou Zhao
137
79
0
30 Sep 2021
VoiceFixer: Toward General Speech Restoration with Neural Vocoder
Haohe Liu
Qiuqiang Kong
Qiao Tian
Yan Zhao
DeLiang Wang
Chuanzeng Huang
Yuxuan Wang
87
58
0
28 Sep 2021
Nana-HDR: A Non-attentive Non-autoregressive Hybrid Model for TTS
Shilu Lin
Wenchao Su
Li Meng
Fenglong Xie
Xinhui Li
Li Lu
131
4
0
28 Sep 2021
Exploring Teacher-Student Learning Approach for Multi-lingual Speech-to-Intent Classification
Bidisha Sharma
Maulik C. Madhavi
Xuehao Zhou
Haizhou Li
54
2
0
28 Sep 2021
Previous
1
2
3
...
14
15
16
...
24
25
26
Next