Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.05884
Cited By
v1
v2 (latest)
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
16 December 2017
Jonathan Shen
Ruoming Pang
Ron J. Weiss
M. Schuster
Navdeep Jaitly
Zongheng Yang
Zhiwen Chen
Yu Zhang
Yuxuan Wang
RJ Skerry-Ryan
Rif A. Saurous
Yannis Agiomyrgiannakis
Yonghui Wu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions"
50 / 1,276 papers shown
Title
NewsPod: Automatic and Interactive News Podcasts
Philippe Laban
Elicia Ye
Srujay Korlakunta
John F. Canny
Marti A. Hearst
54
22
0
15 Feb 2022
Distribution augmentation for low-resource expressive text-to-speech
Mateusz Lajszczak
Animesh Prasad
Arent van Korlaar
Bajibabu Bollepalli
Antonio Bonafonte
...
M. Nicolis
Alexis Moinet
Thomas Drugman
Trevor Wood
Elena Sokolova
61
7
0
13 Feb 2022
I'm Hearing (Different) Voices: Anonymous Voices to Protect User Privacy
H.C.M. Turner
Giulio Lovisotto
Simon Eberz
Ivan Martinovic
32
1
0
13 Feb 2022
Deep Performer: Score-to-Audio Music Performance Synthesis
Hao-Wen Dong
Cong Zhou
Taylor Berg-Kirkpatrick
Julian McAuley
83
17
0
12 Feb 2022
Cross-speaker style transfer for text-to-speech using data augmentation
M. Ribeiro
Julian Roth
Giulia Comini
Goeric Huybrechts
Adam Gabry's
Jaime Lorenzo-Trueba
74
21
0
10 Feb 2022
Building Synthetic Speaker Profiles in Text-to-Speech Systems
Jie Pu
Yi Meng
Oguz H. Elibol
48
2
0
07 Feb 2022
The HCCL-DKU system for fake audio generation task of the 2022 ICASSP ADD Challenge
Ziyi Chen
Hua Hua
Yuxiang Zhang
Ming Li
Pengyuan Zhang
102
0
0
29 Jan 2022
Synthesizing Dysarthric Speech Using Multi-talker TTS for Dysarthric Speech Recognition
M. Soleymanpour
Michael T. Johnson
Rahim Soleymanpour
J. Berry
82
30
0
27 Jan 2022
J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis
Shinnosuke Takamichi
Wataru Nakata
Naoko Tanji
Hiroshi Saruwatari
AuLLM
77
7
0
26 Jan 2022
Zero-Shot Long-Form Voice Cloning with Dynamic Convolution Attention
Artem Gorodetskii
Ivan Ozhiganov
117
2
0
25 Jan 2022
Improving Adversarial Waveform Generation based Singing Voice Conversion with Harmonic Signals
Haohan Guo
Zhiping Zhou
Fanbo Meng
Kai-Chun Liu
100
16
0
25 Jan 2022
Disentangling Style and Speaker Attributes for TTS Style Transfer
Xiaochun An
Frank Soong
Lei Xie
155
18
0
24 Jan 2022
Polyphone disambiguation and accent prediction using pre-trained language models in Japanese TTS front-end
Rem Hida
Masaki Hamada
Chie Kamada
E. Tsunoo
Toshiyuki Sekiya
Toshiyuki Kumakura
34
7
0
24 Jan 2022
Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training
J. Yang
Lei He
93
11
0
20 Jan 2022
MHTTS: Fast multi-head text-to-speech for spontaneous speech with imperfect transcription
Dabiao Ma
Yitong Zhang
Meng Li
Feng Ye
39
1
0
19 Jan 2022
Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis
Yu Wang
Xinsheng Wang
Pengcheng Zhu
Jie Wu
Hanzhao Li
Heyang Xue
Yongmao Zhang
Lei Xie
Mengxiao Bi
109
103
0
19 Jan 2022
MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis
Yinjiao Lei
Shan Yang
Xinsheng Wang
Lei Xie
81
75
0
17 Jan 2022
KazakhTTS2: Extending the Open-Source Kazakh TTS Corpus With More Data, Speakers, and Topics
Saida Mussakhojayeva
Yerbolat Khassanov
H. A. Varol
81
13
0
15 Jan 2022
A Practical Guide to Logical Access Voice Presentation Attack Detection
Xin Wang
Junichi Yamagishi
AAML
109
11
0
10 Jan 2022
Audio representations for deep learning in sound synthesis: A review
Anastasia Natsiou
Seán O'Leary
AI4TS
72
18
0
07 Jan 2022
A sinusoidal signal reconstruction method for the inversion of the mel-spectrogram
Anastasia Natsiou
Seán O'Leary
44
3
0
07 Jan 2022
IQDUBBING: Prosody modeling based on discrete self-supervised speech representation for expressive voice conversion
Wendong Gan
Bolong Wen
Yin Yan
Haitao Chen
Zhichao Wang
Hongqiang Du
Lei Xie
Kaixuan Guo
Hai Li
85
14
0
02 Jan 2022
Multi-speaker Multi-style Text-to-speech Synthesis With Single-speaker Single-style Training Data Scenarios
Qicong Xie
Tao Li
Xinsheng Wang
Zhichao Wang
Lei Xie
Guoqiao Yu
Guanglu Wan
86
11
0
23 Dec 2021
Forensic Analysis of Synthetically Generated Western Blot Images
S. Mandelli
D. Cozzolino
E. D. Cannas
J. P. Cardenuto
Daniel Moreira
...
Walter J. Scheirer
Anderson de Rezende Rocha
L. Verdoliva
Stefano Tubaro
Edward J. Delp
104
21
0
16 Dec 2021
Textless Speech-to-Speech Translation on Real Data
Ann Lee
Hongyu Gong
Paul-Ambroise Duquenne
Holger Schwenk
Peng-Jen Chen
...
Sravya Popuri
Yossi Adi
J. Pino
Jiatao Gu
Wei-Ning Hsu
105
150
0
15 Dec 2021
Generate Point Clouds with Multiscale Details from Graph-Represented Structures
Ximing Yang
Zhibo Zhang
Zhengfu He
Cheng Jin
3DPC
53
1
0
13 Dec 2021
LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading
Leyuan Qu
C. Weber
S. Wermter
79
23
0
09 Dec 2021
Multi-speaker Emotional Text-to-speech Synthesizer
Sungjae Cho
Soo-Young Lee
41
1
0
07 Dec 2021
VocBench: A Neural Vocoder Benchmark for Speech Synthesis
Ehab A. AlBadawy
Andrew Gibiansky
Qing He
Jilong Wu
Ming-Ching Chang
Siwei Lyu
61
12
0
06 Dec 2021
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
Edresson Casanova
Julian Weber
C. Shulby
Arnaldo Cândido Júnior
Eren Golge
M. Ponti
249
415
0
04 Dec 2021
Robust End-to-End Focal Liver Lesion Detection using Unregistered Multiphase Computed Tomography Images
Sang-gil Lee
Eunji Kim
J. S. Bae
J. H. Kim
Sungroh Yoon
OOD
73
11
0
02 Dec 2021
How Deep Are the Fakes? Focusing on Audio Deepfake: A Survey
Zahra Khanjani
Gabrielle Watson
V. P Janeja
61
27
0
28 Nov 2021
Guided-TTS: A Diffusion Model for Text-to-Speech via Classifier Guidance
Heeseung Kim
Sungwon Kim
Sungroh Yoon
DiffM
BDL
131
112
0
23 Nov 2021
Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis
Alexandra Vioni
Myrsini Christidou
Nikolaos Ellinas
G. Vamvoukakis
Panos Kakoulidis
Taehoon Kim
June Sig Sung
Hyoungmin Park
Aimilios Chalamandaris
Pirros Tsiakoulis
60
11
0
19 Nov 2021
Word-Level Style Control for Expressive, Non-attentive Speech Synthesis
Konstantinos Klapsas
Nikolaos Ellinas
June Sig Sung
Hyoungmin Park
S. Raptis
144
9
0
19 Nov 2021
Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control
Myrsini Christidou
Alexandra Vioni
Nikolaos Ellinas
G. Vamvoukakis
K. Markopoulos
Panos Kakoulidis
June Sig Sung
Hyoungmin Park
Aimilios Chalamandaris
Pirros Tsiakoulis
64
4
0
19 Nov 2021
More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech
Michael Hassid
Michelle Tadmor Ramanovich
Brendan Shillingford
Miaosen Wang
Ye Jia
Tal Remez
DiffM
72
18
0
19 Nov 2021
Rapping-Singing Voice Synthesis based on Phoneme-level Prosody Control
K. Markopoulos
Nikolaos Ellinas
Alexandra Vioni
Myrsini Christidou
Panos Kakoulidis
...
Georgia Maniati
June Sig Sung
Hyoungmin Park
Pirros Tsiakoulis
Aimilios Chalamandaris
82
2
0
17 Nov 2021
Cross-lingual Low Resource Speaker Adaptation Using Phonological Features
Georgia Maniati
Nikolaos Ellinas
K. Markopoulos
G. Vamvoukakis
June Sig Sung
Hyoungmin Park
Aimilios Chalamandaris
Pirros Tsiakoulis
70
14
0
17 Nov 2021
High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency
Nikolaos Ellinas
G. Vamvoukakis
K. Markopoulos
Aimilios Chalamandaris
Georgia Maniati
Panos Kakoulidis
S. Raptis
June Sig Sung
Hyoungmin Park
Pirros Tsiakoulis
139
37
0
17 Nov 2021
Improving Prosody for Unseen Texts in Speech Synthesis by Utilizing Linguistic Information and Noisy Data
Zhu Li
Yuqing Zhang
Mengxi Nie
Ming Yan
Mengnan He
Ruixiong Zhang
Caixia Gong
49
3
0
15 Nov 2021
Textless Speech Emotion Conversion using Discrete and Decomposed Representations
Felix Kreuk
Adam Polyak
Jade Copet
Eugene Kharitonov
Tu Nguyen
M. Rivière
Wei-Ning Hsu
Abdel-rahman Mohamed
Emmanuel Dupoux
Yossi Adi
114
34
0
14 Nov 2021
Meta-Voice: Fast few-shot style transfer for expressive voice cloning using meta learning
Songxiang Liu
Jane Polak Scowcroft
Dong Yu
64
10
0
14 Nov 2021
HAPSSA: Holistic Approach to PDF Malware Detection Using Signal and Statistical Analysis
Tajuddin Manhar Mohammed
L. Nataraj
Satish Chikkagoudar
S. Chandrasekaran
B. S. Manjunath
26
7
0
08 Nov 2021
Speaker Generation
Daisy Stanton
Matt Shannon
Soroosh Mariooryad
RJ Skerry-Ryan
Eric Battenberg
Tom Bagby
David Kao
96
30
0
07 Nov 2021
Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech
Sung-Feng Huang
Chyi-Jiunn Lin
Da-Rong Liu
Yi-Chen Chen
Hung-yi Lee
126
57
0
07 Nov 2021
Emotional Prosody Control for Speech Generation
S. Sivaprasad
Saiteja Kosgi
Vineet Gandhi
63
17
0
07 Nov 2021
WaveFake: A Data Set to Facilitate Audio Deepfake Detection
Joel Frank
Lea Schonherr
DiffM
204
131
0
04 Nov 2021
Voice Conversion Can Improve ASR in Very Low-Resource Settings
Matthew Baas
Herman Kamper
101
17
0
04 Nov 2021
A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion
Benjamin van Niekerk
M. Carbonneau
Julian Zaïdi
Matthew Baas
Hugo Seuté
Herman Kamper
DRL
116
123
0
03 Nov 2021
Previous
1
2
3
...
13
14
15
...
24
25
26
Next