Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.05884
Cited By
v1
v2 (latest)
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
16 December 2017
Jonathan Shen
Ruoming Pang
Ron J. Weiss
M. Schuster
Navdeep Jaitly
Zongheng Yang
Zhiwen Chen
Yu Zhang
Yuxuan Wang
RJ Skerry-Ryan
Rif A. Saurous
Yannis Agiomyrgiannakis
Yonghui Wu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions"
50 / 1,276 papers shown
Title
Using VAEs and Normalizing Flows for One-shot Text-To-Speech Synthesis of Expressive Speech
Vatsal Aggarwal
Marius Cotescu
N. Prateek
Jaime Lorenzo-Trueba
Roberto Barra-Chicote
84
31
0
28 Nov 2019
Jejueo Datasets for Machine Translation and Speech Synthesis
Kyubyong Park
Yo Joong Choe
Jiyeon Ham
19
5
0
27 Nov 2019
Neural Percussive Synthesis Parameterised by High-Level Timbral Features
António Ramires
Pritish Chandna
Xavier Favory
Emilia Gómez
Xavier Serra
69
23
0
25 Nov 2019
Prosody Transfer in Neural Text to Speech Using Global Pitch and Loudness Features
Siddharth Gururani
Kilol Gupta
D. Shah
Z. Shakeri
Jervis Pinto
68
15
0
21 Nov 2019
Emotional Voice Conversion using Multitask Learning with Text-to-speech
Tae-Ho Kim
Sungjae Cho
Shinkook Choi
Sejik Park
Soo-Young Lee
92
40
0
11 Nov 2019
A unified sequence-to-sequence front-end model for Mandarin text-to-speech synthesis
Junjie Pan
Xiang Yin
Zhiling Zhang
Shichao Liu
Yang Zhang
Zejun Ma
Yuxuan Wang
47
27
0
11 Nov 2019
Teacher-Student Training for Robust Tacotron-based TTS
Rui Liu
Berrak Sisman
Jingdong Li
F. Bao
Guanglai Gao
Haizhou Li
109
38
0
07 Nov 2019
Incremental Text-to-Speech Synthesis with Prefix-to-Prefix Framework
Mingbo Ma
Baigong Zheng
Kaibo Liu
Renjie Zheng
Hairong Liu
Kainan Peng
Kenneth Church
Liang Huang
66
31
0
07 Nov 2019
Emotional speech synthesis with rich and granularized control
Seyun Um
Sangshin Oh
Kyungguen Byun
Inseon Jang
C. Ahn
Hong-Goo Kang
74
90
0
05 Nov 2019
ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech
Xin Wang
Junichi Yamagishi
Massimiliano Todisco
Héctor Delgado
A. Nautsch
...
J. Bonastre
Avashna Govender
S. Ronanki
Jing-Xuan Zhang
Zhenhua Ling
83
12
0
05 Nov 2019
A comparative study of estimating articulatory movements from phoneme sequences and acoustic features
Abhayjeet Singh
Aravind Illa
P. Ghosh
36
8
0
31 Oct 2019
a novel cross-lingual voice cloning approach with a few text-free samples
Xinyong Zhou
Hao Che
Xiaorui Wang
Lei Xie
22
4
0
29 Oct 2019
Disentangling Timbre and Singing Style with Multi-singer Singing Synthesis System
Juheon Lee
Hyeong-Seok Choi
Junghyun Koo
Kyogu Lee
35
18
0
29 Oct 2019
Towards Unsupervised Speech Recognition and Synthesis with Quantized Speech Representation Learning
Alexander H. Liu
Tao Tu
Hung-yi Lee
Lin-Shan Lee
SSL
105
50
0
28 Oct 2019
Effect of choice of probability distribution, randomness, and search methods for alignment modeling in sequence-to-sequence text-to-speech synthesis using hard alignment
Yusuke Yasuda
Xin Wang
Junichi Yamagishi
21
2
0
28 Oct 2019
Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens
Rafael Valle
Jason Chun Lok Li
R. Prenger
Bryan Catanzaro
82
149
0
26 Oct 2019
Multi-Reference Neural TTS Stylization with Adversarial Cycle Consistency
M. Whitehill
Shuang Ma
Daniel J. McDuff
Yale Song
111
35
0
25 Oct 2019
Learning audio representations via phase prediction
Félix de Chaumont Quitry
Marco Tagliasacchi
Dominik Roblek
SSL
AI4TS
52
10
0
25 Oct 2019
Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram
Ryuichi Yamamoto
Eunwoo Song
Jae-Min Kim
195
821
0
25 Oct 2019
Vision-Infused Deep Audio Inpainting
Hang Zhou
Ziwei Liu
Lingfeng Guo
Ping Luo
Dahua Lin
142
88
0
24 Oct 2019
Fast and High-Quality Singing Voice Synthesis System based on Convolutional Neural Networks
Kazuhiro Nakamura
Shinji Takaki
Kei Hashimoto
Keiichiro Oura
Yoshihiko Nankaku
K. Tokuda
84
19
0
24 Oct 2019
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit
Tomoki Hayashi
Ryuichi Yamamoto
Katsuki Inoue
Takenori Yoshimura
Shinji Watanabe
Tomoki Toda
K. Takeda
Yu Zhang
Xu Tan
VLM
93
205
0
24 Oct 2019
Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis
Eric Battenberg
RJ Skerry-Ryan
Soroosh Mariooryad
Daisy Stanton
David Kao
Matt Shannon
Tom Bagby
106
114
0
23 Oct 2019
Sequence-to-sequence Singing Synthesis Using the Feed-forward Transformer
Merlijn Blaauw
J. Bonada
73
55
0
22 Oct 2019
MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis
Kundan Kumar
Rithesh Kumar
T. Boissière
L. Gestin
Wei Zhen Teoh
Jose M. R. Sotelo
A. D. Brébisson
Yoshua Bengio
Aaron Courville
GAN
178
961
0
08 Oct 2019
Semi-Supervised Generative Modeling for Controllable Speech Synthesis
Raza Habib
Soroosh Mariooryad
Matt Shannon
Eric Battenberg
RJ Skerry-Ryan
Daisy Stanton
David Kao
Tom Bagby
BDL
68
48
0
03 Oct 2019
Attention Forcing for Sequence-to-sequence Model Training
Qingyun Dou
Yiting Lu
Joshua Efiong
Mark Gales
62
6
0
26 Sep 2019
Speech Recognition with Augmented Synthesized Speech
Andrew Rosenberg
Yu Zhang
Bhuvana Ramabhadran
Ye Jia
Pedro J. Moreno
Yonghui Wu
Zelin Wu
69
128
0
25 Sep 2019
High Fidelity Speech Synthesis with Adversarial Networks
Mikolaj Binkowski
Jeff Donahue
Sander Dieleman
Aidan Clark
Erich Elsen
Norman Casagrande
Luis C. Cobo
Karen Simonyan
309
240
0
25 Sep 2019
Sequence to Sequence Neural Speech Synthesis with Prosody Modification Capabilities
Slava Shechtman
A. Sorin
49
33
0
23 Sep 2019
Harnessing Indirect Training Data for End-to-End Automatic Speech Translation: Tricks of the Trade
J. Pino
Liezl Puzon
Jiatao Gu
Xutai Ma
Arya D. McCarthy
D. Gopinath
25
3
0
14 Sep 2019
A Comparative Study on Transformer vs RNN in Speech Applications
Shigeki Karita
Nanxin Chen
Tomoki Hayashi
Takaaki Hori
Hirofumi Inaguma
...
Ryuichi Yamamoto
Xiao-fei Wang
Shinji Watanabe
Takenori Yoshimura
Wangyou Zhang
94
722
0
13 Sep 2019
Preech: A System for Privacy-Preserving Speech Transcription
Shimaa Ahmed
Amrita Roy Chowdhury
Kassem Fawaz
P. Ramanathan
127
48
0
09 Sep 2019
DurIAN: Duration Informed Attention Network For Multimodal Synthesis
Chengzhu Yu
Heng Lu
Na Hu
Meng Yu
Chao Weng
...
Deyi Tuo
Shiyin Kang
Guangzhi Lei
Jane Polak Scowcroft
Dong Yu
CVBM
80
118
0
04 Sep 2019
Initial investigation of an encoder-decoder end-to-end TTS framework using marginalization of monotonic hard latent alignments
Yusuke Yasuda
Xin Wang
Junichi Yamagishi
58
8
0
30 Aug 2019
Maximizing Mutual Information for Tacotron
Peng Liu
Xixin Wu
Shiyin Kang
Guangzhi Li
Jane Polak Scowcroft
Dong Yu
86
16
0
30 Aug 2019
Unpaired Image-to-Speech Synthesis with Multimodal Information Bottleneck
Shuang Ma
Daniel J. McDuff
Yale Song
89
25
0
19 Aug 2019
Statistical Voice Conversion with Quasi-Periodic WaveNet Vocoder
Yi-Chiao Wu
Patrick Lumban Tobing
Tomoki Hayashi
Kazuhiro Kobayashi
Tomoki Toda
128
2
0
21 Jul 2019
DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis
Yuki Saito
Shinnosuke Takamichi
Hiroshi Saruwatari
40
10
0
19 Jul 2019
Forward-Backward Decoding for Regularizing End-to-End TTS
Yibin Zheng
Xi Wang
Lei He
Shifeng Pan
Frank Soong
Zhengqi Wen
J. Tao
41
13
0
18 Jul 2019
Hierarchical Sequence to Sequence Voice Conversion with Limited Data
P. Narayanan
Punarjay Chakravarty
F. Charette
G. Puskorius
53
3
0
15 Jul 2019
Multi-Speaker End-to-End Speech Synthesis
Jihyun Park
Kexin Zhao
Kainan Peng
Ming-Yu Liu
SyDa
74
19
0
09 Jul 2019
Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning
Yu Zhang
Ron J. Weiss
Heiga Zen
Yonghui Wu
Zhiwen Chen
RJ Skerry-Ryan
Ye Jia
Andrew Rosenberg
Bhuvana Ramabhadran
76
189
0
09 Jul 2019
Speech bandwidth extension with WaveNet
Archit Gupta
Brendan Shillingford
Yannis Assael
Thomas C. Walters
60
29
0
05 Jul 2019
Fine-grained robust prosody transfer for single-speaker neural text-to-speech
V. Klimkov
S. Ronanki
Jonas Rohnke
Thomas Drugman
AI4TS
89
82
0
04 Jul 2019
Polyphone Disambiguation for Mandarin Chinese Using Conditional Neural Network with Multi-level Embedding Features
Zexin Cai
Yaogen Yang
Chuxiong Zhang
Xiaoyi Qin
Ming Li
66
26
0
03 Jul 2019
Conditioned-U-Net: Introducing a Control Mechanism in the U-Net for Multiple Source Separations
Gabriel Meseguer-Brocal
Geoffroy Peeters
84
61
0
02 Jul 2019
Quasi-Periodic WaveNet Vocoder: A Pitch Dependent Dilated Convolution Model for Parametric Speech Generation
Yi-Chiao Wu
Tomoki Hayashi
Patrick Lumban Tobing
Kazuhiro Kobayashi
Tomoki Toda
46
16
0
01 Jul 2019
RUSLAN: Russian Spoken Language Corpus for Speech Synthesis
Lenar Gabdrakhmanov
Rustem Garaev
E. Razinkov
42
10
0
26 Jun 2019
End-to-End Emotional Speech Synthesis Using Style Tokens and Semi-Supervised Training
Peng Wu
Zhenhua Ling
Li-Juan Liu
Yuan Jiang
Hong-Chuan Wu
Lirong Dai
88
72
0
26 Jun 2019
Previous
1
2
3
...
22
23
24
25
26
Next