Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.05884
Cited By
v1
v2 (latest)
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
16 December 2017
Jonathan Shen
Ruoming Pang
Ron J. Weiss
M. Schuster
Navdeep Jaitly
Zongheng Yang
Zhiwen Chen
Yu Zhang
Yuxuan Wang
RJ Skerry-Ryan
Rif A. Saurous
Yannis Agiomyrgiannakis
Yonghui Wu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions"
50 / 1,276 papers shown
Title
Estimating articulatory movements in speech production with transformer networks
Sathvik Udupa
Anwesha Roy
Abhayjeet Singh
Aravind Illa
P. Ghosh
58
16
0
11 Apr 2021
Flavored Tacotron: Conditional Learning for Prosodic-linguistic Features
Mahsa Elyasi
Gaurav Bharaj
44
2
0
08 Apr 2021
Exploring Machine Speech Chain for Domain Adaptation and Few-Shot Speaker Adaptation
Fengpeng Yue
Yan Deng
Lei He
Tom Ko
70
8
0
08 Apr 2021
Half-Truth: A Partially Fake Audio Detection Dataset
Jiangyan Yi
Ye Bai
J. Tao
Haoxin Ma
Zhengkun Tian
Chenglong Wang
Tao Wang
Ruibo Fu
83
85
0
08 Apr 2021
Towards Multi-Scale Style Control for Expressive Speech Synthesis
Xiang Li
Changhe Song
Jingbei Li
Zhiyong Wu
Jia Jia
Helen Meng
64
47
0
08 Apr 2021
The AS-NU System for the M2VoC Challenge
Cheng-Hung Hu
Yi-Chiao Wu
Wen-Chin Huang
Yu-Huai Peng
Yu-Wen Chen
Pin-Jui Ku
Tomoki Toda
Yu Tsao
Hsin-Min Wang
54
1
0
07 Apr 2021
Diff-TTS: A Denoising Diffusion Model for Text-to-Speech
Myeonghun Jeong
Hyeongju Kim
Sung Jun Cheon
Byoung Jin Choi
N. Kim
DiffM
70
197
0
03 Apr 2021
Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability
Rui Liu
Berrak Sisman
Haizhou Li
69
32
0
03 Apr 2021
Attention Forcing for Machine Translation
Qingyun Dou
Yiting Lu
Potsawee Manakul
Xixin Wu
Mark Gales
60
7
0
02 Apr 2021
SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model
Edresson Casanova
C. Shulby
Eren Golge
Nicolas Müller
F. S. Oliveira
Arnaldo Cândido Júnior
A. S. Soares
S. Aluísio
M. Ponti
77
100
0
02 Apr 2021
Assem-VC: Realistic Voice Conversion by Assembling Modern Speech Synthesis Techniques
Kang-Wook Kim
Seung-won Park
Junhyeok Lee
Myun-chul Joe
76
28
0
02 Apr 2021
Two Truths and a Lie: Exploring Soft Moderation of COVID-19 Misinformation with Amazon Alexa
Donald Gover
Filipo Sharevski
42
8
0
01 Apr 2021
Multi-rate attention architecture for fast streamable Text-to-speech spectrum modeling
Qing He
Zhiping Xiu
T. Koehler
Jilong Wu
75
7
0
01 Apr 2021
Fast DCTTS: Efficient Deep Convolutional Text-to-Speech
M. Kang
Jihyun Lee
Simin Kim
Injung Kim
54
6
0
01 Apr 2021
Using Python for Model Inference in Deep Learning
Zach DeVito
Jason Ansel
William Constable
Michael Suo
Ailing Zhang
K. Hazelwood
SyDa
BDL
33
4
0
01 Apr 2021
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS
Ye Jia
Heiga Zen
Jonathan Shen
Yu Zhang
Yonghui Wu
SSL
103
84
0
28 Mar 2021
Continual Speaker Adaptation for Text-to-Speech Synthesis
Hamed Hemati
Damian Borth
CLL
77
9
0
26 Mar 2021
SwissDial: Parallel Multidialectal Corpus of Spoken Swiss German
Pelin Dogan-Schönberger
Julian Mäder
Thomas Hofmann
65
30
0
21 Mar 2021
STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech
Keon Lee
Kyumin Park
Daeyoung Kim
69
32
0
17 Mar 2021
Latent Space Explorations of Singing Voice Synthesis using DDSP
J. Alonso
Cumhur Erkut
145
12
0
12 Mar 2021
GAN Vocoder: Multi-Resolution Discriminator Is All You Need
J. You
Dalhyun Kim
Gyuhyeon Nam
Geumbyeol Hwang
Gyeongsu Chae
68
27
0
09 Mar 2021
CUHK-EE Voice Cloning System for ICASSP 2021 M2VoC Challenge
Daxin Tan
Hingpang Huang
Guangyan Zhang
Tan Lee
65
6
0
08 Mar 2021
Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech
C. Chien
Jheng-hao Lin
Chien-yu Huang
Po-Chun Hsu
Hung-yi Lee
119
70
0
06 Mar 2021
Multilingual Byte2Speech Models for Scalable Low-resource Speech Synthesis
Mutian He
Jingzhou Yang
Lei He
Frank Soong
47
18
0
05 Mar 2021
WaveGuard: Understanding and Mitigating Audio Adversarial Examples
Shehzeen Samarah Hussain
Paarth Neekhara
Shlomo Dubnov
Julian McAuley
F. Koushanfar
AAML
90
74
0
04 Mar 2021
A Neural Text-to-Speech Model Utilizing Broadcast Data Mixed with Background Music
Hanbin Bae
Jaesung Bae
Young-Sun Joo
Young-Ik Kim
Hoon-Young Cho
29
2
0
04 Mar 2021
A Spectral Enabled GAN for Time Series Data Generation
Kaleb E. Smith
Anthony O. Smith
GAN
45
12
0
02 Mar 2021
AdaSpeech: Adaptive Text to Speech for Custom Voice
Mingjian Chen
Xu Tan
Bohan Li
Yanqing Liu
Tao Qin
Sheng Zhao
Tie-Yan Liu
VLM
DiffM
112
192
0
01 Mar 2021
MBNet: MOS Prediction for Synthesized Speech with Mean-Bias Network
Yichong Leng
Xu Tan
Sheng Zhao
Frank Soong
Xiang-Yang Li
Tao Qin
88
96
0
27 Feb 2021
MaskCycleGAN-VC: Learning Non-parallel Voice Conversion with Filling in Frames
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Nobukatsu Hojo
73
60
0
25 Feb 2021
Generating Human Readable Transcript for Automatic Speech Recognition with Pre-trained Language Model
Junwei Liao
Yu Shi
Ming Gong
Linjun Shou
Sefik Emre Eskimez
Liyang Lu
Hong Qu
Michael Zeng
36
9
0
22 Feb 2021
AISPEECH-SJTU accent identification system for the Accented English Speech Recognition Challenge
Houjun Huang
Xu Xiang
Yexin Yang
Rao Ma
Y. Qian
81
26
0
19 Feb 2021
PeriodNet: A non-autoregressive waveform generation model with a structure separating periodic and aperiodic components
Yukiya Hono
Shinji Takaki
Kei Hashimoto
Keiichiro Oura
Yoshihiko Nankaku
K. Tokuda
69
16
0
15 Feb 2021
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention
Peng Liu
Yuewen Cao
Songxiang Liu
Na Hu
Guangzhi Li
Chao Weng
Jane Polak Scowcroft
95
22
0
12 Feb 2021
Voice Cloning: a Multi-Speaker Text-to-Speech Synthesis Approach based on Transfer Learning
Giuseppe Ruggiero
Enrico Zovato
Luigi Di Caro
V. Pollet
DiffM
63
10
0
10 Feb 2021
LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search
Renqian Luo
Xu Tan
Rui Wang
Tao Qin
Jinzhu Li
Sheng Zhao
Enhong Chen
Tie-Yan Liu
64
62
0
08 Feb 2021
Towards Natural and Controllable Cross-Lingual Voice Conversion Based on Neural TTS Model and Phonetic Posteriorgram
Shengkui Zhao
Hao Wang
Trung Hieu Nguyen
B. Ma
51
20
0
03 Feb 2021
Generative Spoken Language Modeling from Raw Audio
Kushal Lakhotia
Evgeny Kharitonov
Wei-Ning Hsu
Yossi Adi
Adam Polyak
...
Tu Nguyen
Jade Copet
Alexei Baevski
A. Mohamed
Emmanuel Dupoux
AuLLM
292
366
0
01 Feb 2021
Universal Neural Vocoding with Parallel WaveNet
Yunlong Jiao
Adam Gabry's
Georgi Tinchev
Bartosz Putrycz
Daniel Korzekwa
V. Klimkov
81
42
0
01 Feb 2021
Rich Prosody Diversity Modelling with Phone-level Mixture Density Network
Chenpeng Du
K. Yu
167
17
0
01 Feb 2021
Triple M: A Practical Text-to-speech Synthesis System With Multi-guidance Attention And Multi-band Multi-time LPCNet
Shilu Lin
Fenglong Xie
Li Meng
Xinhui Li
Li Lu
72
0
0
30 Jan 2021
Expressive Neural Voice Cloning
Paarth Neekhara
Shehzeen Samarah Hussain
Shlomo Dubnov
F. Koushanfar
Julian McAuley
DiffM
59
30
0
30 Jan 2021
Improved parallel WaveGAN vocoder with perceptually weighted spectrogram loss
Eunwoo Song
Ryuichi Yamamoto
Min-Jae Hwang
Jin-Seob Kim
Ohsung Kwon
Jae-Min Kim
71
14
0
19 Jan 2021
EmoCat: Language-agnostic Emotional Voice Conversion
Bastian Schnell
Goeric Huybrechts
Bartek Perz
Thomas Drugman
Jaime Lorenzo-Trueba
89
11
0
14 Jan 2021
Generating coherent spontaneous speech and gesture from text
Simon Alexanderson
Éva Székely
G. Henter
Taras Kucherenko
Jonas Beskow
SLR
187
24
0
14 Jan 2021
Whispered and Lombard Neural Speech Synthesis
Qiong Hu
T. Bleisch
Petko N. Petkov
T. Raitio
Erik Marchi
V. Lakshminarasimhan
63
14
0
13 Jan 2021
A Survey on Deep Reinforcement Learning for Audio-Based Applications
S. Latif
Heriberto Cuayáhuitl
Farrukh Pervez
Fahad Shamshad
Hafiz Shehbaz Ali
Min Zhang
OffRL
123
75
0
01 Jan 2021
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units
Wei-Ning Hsu
David Harwath
Christopher Song
James R. Glass
CLIP
90
67
0
31 Dec 2020
Building Multi lingual TTS using Cross Lingual Voice Conversion
Qinghua Sun
Kenji Nagamatsu
21
3
0
28 Dec 2020
The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans
Shinji Watanabe
Florian Boyer
Xuankai Chang
Pengcheng Guo
Tomoki Hayashi
...
Shigeki Karita
Chenda Li
Jing Shi
Aswin Shanmugam Subramanian
Wangyou Zhang
VLM
108
38
0
23 Dec 2020
Previous
1
2
3
...
17
18
19
...
24
25
26
Next