Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.05884
Cited By
v1
v2 (latest)
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
16 December 2017
Jonathan Shen
Ruoming Pang
Ron J. Weiss
M. Schuster
Navdeep Jaitly
Zongheng Yang
Zhiwen Chen
Yu Zhang
Yuxuan Wang
RJ Skerry-Ryan
Rif A. Saurous
Yannis Agiomyrgiannakis
Yonghui Wu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions"
50 / 1,276 papers shown
Title
Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech
Ziyue Jiang
Zhe Su
Zhou Zhao
Qian Yang
Yi Ren
Jinglin Liu
Zhe Ye
73
5
0
05 Jun 2022
Pronunciation Dictionary-Free Multilingual Speech Synthesis by Combining Unsupervised and Supervised Phonetic Representations
Chang Liu
Zhenhua Ling
Linghui Chen
68
3
0
02 Jun 2022
StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis
Yinghao Aaron Li
Cong Han
N. Mesgarani
112
40
0
30 May 2022
Guided-TTS 2: A Diffusion Model for High-quality Adaptive Text-to-Speech with Untranscribed Data
Sungwon Kim
Heeseung Kim
Sung-Hoon Yoon
DiffM
249
53
0
30 May 2022
Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel's Weekly Video Podcasts
Debjoy Saha
Shravan Nayak
Timo Baumann
87
3
0
24 May 2022
TDASS: Target Domain Adaptation Speech Synthesis Framework for Multi-speaker Low-Resource TTS
Xulong Zhang
Jianzong Wang
Ning Cheng
Jing Xiao
67
14
0
24 May 2022
Self-Supervised Speech Representation Learning: A Review
Abdel-rahman Mohamed
Hung-yi Lee
Lasse Borgholt
Jakob Drachmann Havtorn
Joakim Edin
...
Shang-Wen Li
Karen Livescu
Lars Maaløe
Tara N. Sainath
Shinji Watanabe
SSL
AI4TS
291
368
0
21 May 2022
PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit
Hui Zhang
Tian Yuan
Junkun Chen
Xintong Li
Renjie Zheng
...
Zeyu Chen
Xiaoguang Hu
Dianhai Yu
Yanjun Ma
Liang Huang
AuLLM
74
28
0
20 May 2022
Macedonian Speech Synthesis for Assistive Technology Applications
B. Sofronievski
Elena Velovska
Martin Velichkovski
Violeta Argirova
Tea Veljkovikj
...
Kristijan Lazarev
Toni Bachvarovski
Z. Ivanovski
Dimitar Tashkovski
B. Gerazov
20
0
0
18 May 2022
GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech
Rongjie Huang
Yi Ren
Jinglin Liu
Chenye Cui
Zhou Zhao
OODD
VLM
195
34
0
15 May 2022
Improved Consistency Training for Semi-Supervised Sequence-to-Sequence ASR via Speech Chain Reconstruction and Self-Transcribing
Heli Qi
Sashi Novitasari
S. Sakti
Satoshi Nakamura
AI4TS
142
2
0
14 May 2022
Talking Face Generation with Multilingual TTS
Hyoung-Kyu Song
Sanghyun Woo
Junhyeok Lee
S. Yang
Hyunjae Cho
Youseong Lee
Dongho Choi
Kang-Wook Kim
CVBM
80
22
0
13 May 2022
Deep Learning and Synthetic Media
Raphaël Millière
69
18
0
11 May 2022
Deep Learning Enabled Semantic Communications with Speech Recognition and Synthesis
Zhenzi Weng
Zhijin Qin
Xiaoming Tao
Chengkang Pan
Guangyi Liu
Geoffrey Ye Li
87
143
0
09 May 2022
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
Xu Tan
Jiawei Chen
Haohe Liu
Jian Cong
Chen Zhang
...
Lei He
Frank Soong
Tao Qin
Sheng Zhao
Tie-Yan Liu
141
221
0
09 May 2022
Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech
Yongqian Li
Cheng Yu
Guangzhi Sun
Hua Jiang
Fanglei Sun
Weiqin Zu
Ying Wen
Yang Yang
Jun Wang
53
7
0
09 May 2022
ReCAB-VAE: Gumbel-Softmax Variational Inference Based on Analytic Divergence
Sangshin Oh
Seyun Um
Hong-Goo Kang
BDL
DRL
43
2
0
09 May 2022
Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis
Jiatong Shi
Shuai Guo
Tao Qian
Nan Huo
Tomoki Hayashi
...
Xuankai Chang
Hua-Wei Li
Peter Wu
Shinji Watanabe
Qin Jin
VLM
111
27
0
09 May 2022
Synthetic Data -- what, why and how?
James Jordon
Lukasz Szpruch
F. Houssiau
M. Bottarelli
Giovanni Cherubin
Carsten Maple
Samuel N. Cohen
Adrian Weller
96
120
0
06 May 2022
SVTS: Scalable Video-to-Speech Synthesis
Rodrigo Mira
A. Haliassos
Stavros Petridis
Björn W. Schuller
Maja Pantic
71
35
0
04 May 2022
How does a spontaneously speaking conversational agent affect user behavior?
Takahisa Iizuka
H. Mori
17
3
0
02 May 2022
DeepGraviLens: a Multi-Modal Architecture for Classifying Gravitational Lensing Data
Nicolò Oreste Pinciroli Vago
Piero Fraternali
94
3
0
02 May 2022
Regotron: Regularizing the Tacotron2 architecture via monotonic alignment loss
Efthymios Georgiou
Kosmas Kritsis
Georgios Paraskevopoulos
Athanasios Katsamanis
Vassilis Katsouros
Alexandros Potamianos
126
3
0
28 Apr 2022
An Overview of Recent Work in Media Forensics: Methods and Threats
Kratika Bhagtani
A. Yadav
Emily R. Bartusiak
Ziyue Xiang
Ruiting Shao
Sriram Baireddy
Edward J. Delp
AAML
94
25
0
26 Apr 2022
Parallel Synthesis for Autoregressive Speech Generation
Po-Chun Hsu
Da-Rong Liu
Andy T. Liu
Hung-yi Lee
80
5
0
25 Apr 2022
Improving Self-Supervised Learning-based MOS Prediction Networks
Bálint Gyires-Tóth
Csaba Zainkó
SSL
38
1
0
23 Apr 2022
LibriS2S: A German-English Speech-to-Speech Translation Corpus
Pedro Jeuris
Jan Niehues
AuLLM
29
3
0
22 Apr 2022
Layer-wise Fast Adaptation for End-to-End Multi-Accent Speech Recognition
Xun Gong
Y. Qian
Houjun Huang
Yanmin Qian
81
46
0
21 Apr 2022
Time Domain Adversarial Voice Conversion for ADD 2022
Cheng Wen
Tingwei Guo
Xi Tan
Rui Yan
Shuran Zhou
Chuandong Xie
Wei Zou
Xiangang Li
74
4
0
19 Apr 2022
Simulation of machine learning-based 6G systems in virtual worlds
A. Oliveira
Felipe Bastos
Isabela Trindade
Walter Frazao
Arthur Nascimento
D. Gomes
Francisco Muller
A. Klautau
25
1
0
15 Apr 2022
Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization
Zhixi Cai
Kalin Stefanov
Abhinav Dhall
Munawar Hayat
70
3
0
13 Apr 2022
Fine-grained Noise Control for Multispeaker Speech Synthesis
Karolos Nikitaras
G. Vamvoukakis
Nikolaos Ellinas
Konstantinos Klapsas
K. Markopoulos
S. Raptis
June Sig Sung
Gunu Jho
Aimilios Chalamandaris
Pirros Tsiakoulis
69
5
0
11 Apr 2022
Karaoker: Alignment-free singing voice synthesis with speech training data
Panos Kakoulidis
Nikolaos Ellinas
G. Vamvoukakis
K. Markopoulos
June Sig Sung
Gunu Jho
Pirros Tsiakoulis
Aimilios Chalamandaris
105
3
0
08 Apr 2022
Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-Speech
Jaesung Bae
Jinhyeok Yang
Taejun Bak
Young-Sun Joo
DiffM
126
6
0
08 Apr 2022
The Sillwood Technologies System for the VoiceMOS Challenge 2022
Jiameng Gao
59
0
0
08 Apr 2022
Heterogeneous Target Speech Separation
Hyunjae Cho
Wonbin Jung
Junhyeok Lee
Paris Smaragdis
Sanghyun Woo
92
26
0
07 Apr 2022
Self-supervised learning for robust voice cloning
Konstantinos Klapsas
Nikolaos Ellinas
Karolos Nikitaras
G. Vamvoukakis
Panos Kakoulidis
...
S. Raptis
June Sig Sung
Gunu Jho
Aimilios Chalamandaris
Pirros Tsiakoulis
SSL
84
6
0
07 Apr 2022
Arabic Text-To-Speech (TTS) Data Preparation
Hala Al Masri
Muhy Eddin Za'ter
28
1
0
07 Apr 2022
SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis
Georgia Maniati
Alexandra Vioni
Nikolaos Ellinas
Karolos Nikitaras
Konstantinos Klapsas
June Sig Sung
Gunu Jho
Aimilios Chalamandaris
Pirros Tsiakoulis
75
28
0
06 Apr 2022
Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis
Shunwei Lei
Yixuan Zhou
Liyang Chen
Jiankun Hu
Zhiyong Wu
Shiyin Kang
Helen Meng
79
10
0
06 Apr 2022
Adversarial Learning of Intermediate Acoustic Feature for End-to-End Lightweight Text-to-Speech
Hyungchan Yoon
Seyun Um
Changwhan Kim
Hong-Goo Kang
45
0
0
05 Apr 2022
Into-TTS : Intonation Template Based Prosody Control System
Jihwan Lee
Joun Yeop Lee
Heejin Choi
Seongkyu Mun
Sangjun Park
Jae-Sung Bae
Chanwoo Kim
135
4
0
04 Apr 2022
Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video
Minsu Kim
Joanna Hong
Se Jin Park
Yong Man Ro
CVBM
81
42
0
04 Apr 2022
Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis
Yixuan Zhou
Changhe Song
Xiang Li
Lu Zhang
Zhiyong Wu
Yanyao Bian
Jane Polak Scowcroft
Helen Meng
139
23
0
03 Apr 2022
End-to-end model for named entity recognition from speech without paired training data
Salima Mdhaffar
J. Duret
Titouan Parcollet
Yannick Esteve
86
13
0
02 Apr 2022
Residual-guided Personalized Speech Synthesis based on Face Image
Jianrong Wang
Zixuan Wang
Xiaosheng Hu
Xuewei Li
Qiang Fang
Li Liu
CVBM
51
17
0
01 Apr 2022
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios
Yihan Wu
Xu Tan
Bohan Li
Lei He
Sheng Zhao
Ruihua Song
Tao Qin
Tie-Yan Liu
VLM
DiffM
85
69
0
01 Apr 2022
Text-To-Speech Data Augmentation for Low Resource Speech Recognition
Rodolfo Zevallos
50
4
0
01 Apr 2022
Universal Adaptor: Converting Mel-Spectrograms Between Different Configurations for Speech Synthesis
Fan Wang
Po-Chun Hsu
Da-Rong Liu
Hung-yi Lee
58
0
0
01 Apr 2022
Data-augmented cross-lingual synthesis in a teacher-student framework
M. D. Korte
Jaebok Kim
A. Kunikoshi
Adaeze Adigwe
E. Klabbers
59
0
0
31 Mar 2022
Previous
1
2
3
...
11
12
13
...
24
25
26
Next