Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.05884
Cited By
v1
v2 (latest)
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
16 December 2017
Jonathan Shen
Ruoming Pang
Ron J. Weiss
M. Schuster
Navdeep Jaitly
Zongheng Yang
Zhiwen Chen
Yu Zhang
Yuxuan Wang
RJ Skerry-Ryan
Rif A. Saurous
Yannis Agiomyrgiannakis
Yonghui Wu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions"
50 / 1,276 papers shown
Title
Evaluating Methods for Ground-Truth-Free Foreign Accent Conversion
Wen-Chin Huang
Tomoki Toda
CVBM
99
5
0
05 Sep 2023
A Comparative Analysis of Pretrained Language Models for Text-to-Speech
M. G. Moya
Panagiota Karanasou
S. Karlapati
Bastian Schnell
Nicole Peinelt
Alexis Moinet
Thomas Drugman
83
3
0
04 Sep 2023
MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling
Zhichao Wang
Xinsheng Wang
Qicong Xie
Tao Li
Linfu Xie
Qiao Tian
Yuping Wang
114
4
0
03 Sep 2023
Timbre-reserved Adversarial Attack in Speaker Identification
Qing Wang
Jixun Yao
Li Zhang
Pengcheng Guo
Linfu Xie
AAML
79
4
0
02 Sep 2023
DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin
Tao Li
Chenxu Hu
Jian Cong
Xinfa Zhu
Jingbei Li
Qiao Tian
Yuping Wang
Linfu Xie
DiffM
85
9
0
02 Sep 2023
QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning
Haohan Guo
Fenglong Xie
Jiawen Kang
Yujia Xiao
Xixin Wu
Helen M. Meng
82
3
0
31 Aug 2023
Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis
Weiqin Li
Shunwei Lei
Qiaochu Huang
Yixuan Zhou
Zhiyong Wu
Shiyin Kang
Helen Meng
61
4
0
31 Aug 2023
CALM: Contrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis
Yi Meng
Xiang Li
Zhiyong Wu
Tingtian Li
Zixun Sun
Xinyu Xiao
Chi Sun
Hui Zhan
Helen Meng
62
1
0
30 Aug 2023
The DeepZen Speech Synthesis System for Blizzard Challenge 2023
C. Veaux
R. Maia
Spyridoula Papendreou
87
1
0
30 Aug 2023
Let There Be Sound: Reconstructing High Quality Speech from Silent Videos
Ji-Hoon Kim
Jaehun Kim
Joon Son Chung
59
7
0
29 Aug 2023
Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech
Hyungchan Yoon
Changhwan Kim
Eunwoo Song
Hyun-Wook Yoon
Hong-Goo Kang
78
1
0
28 Aug 2023
Expressive paragraph text-to-speech synthesis with multi-step variational autoencoder
Xuyuan Li
Zengqiang Shang
Peiyang Shi
Hua Hua
Jian Liu
Pengyuan Zhang
88
0
0
25 Aug 2023
Generalizable Zero-Shot Speaker Adaptive Speech Synthesis with Disentangled Representations
Wen Wang
Yang Song
S. Jha
79
8
0
24 Aug 2023
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Min Zhang
Björn W. Schuller
LM&MA
AuLLM
188
39
0
24 Aug 2023
The DKU-DUKEECE System for the Manipulation Region Location Task of ADD 2023
Zexin Cai
Weiqing Wang
Yikang Wang
Ming Li
50
7
0
20 Aug 2023
Refining a Deep Learning-based Formant Tracker using Linear Prediction Methods
P. Alku
Sudarsana Reddy Kadiri
Dhananjaya N. Gowda
32
8
0
17 Aug 2023
Long-frame-shift Neural Speech Phase Prediction with Spectral Continuity Enhancement and Interpolation Error Compensation
Yang Ai
Ye-Xin Lu
Zhenhua Ling
84
5
0
17 Aug 2023
Accurate synthesis of Dysarthric Speech for ASR data augmentation
M. Soleymanpour
Michael T. Johnson
Rahim Soleymanpour
J. Berry
86
3
0
16 Aug 2023
Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS
Myeongji Ko
Yong-Hoon Choi
DiffM
72
1
0
03 Aug 2023
Improving grapheme-to-phoneme conversion by learning pronunciations from speech recordings
M. Ribeiro
Giulia Comini
Jaime Lorenzo-Trueba
66
4
0
31 Jul 2023
Audio-visual video-to-speech synthesis with synthesized input audio
Triantafyllos Kefalas
Yannis Panagakis
Maja Pantic
VGen
DiffM
98
1
0
31 Jul 2023
DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial Training
H. Oh
Sang-Hoon Lee
Seong-Whan Lee
DiffM
102
16
0
31 Jul 2023
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
Jungil Kong
Jihoon Park
Beomjeong Kim
Jeongmin Kim
Dohee Kong
Sangjin Kim
59
41
0
31 Jul 2023
MSStyleTTS: Multi-Scale Style Modeling with Hierarchical Context Information for Expressive Speech Synthesis
Shunwei Lei
Yixuan Zhou
Liyang Chen
Zhiyong Wu
Xixin Wu
Shiyin Kang
Helen Meng
87
7
0
29 Jul 2023
A Comprehensive Evaluation and Analysis Study for Chinese Spelling Check
Xunjian Yin
Xiao-Yi Wan
ELM
62
3
0
25 Jul 2023
SCRAPS: Speech Contrastive Representations of Acoustic and Phonetic Spaces
Iván Vallés-Pérez
Grzegorz Beringer
Piotr Bilinski
G. Cook
Roberto Barra-Chicote
58
1
0
23 Jul 2023
Signal Reconstruction from Mel-spectrogram Based on Bi-level Consistency of Full-band Magnitude and Phase
Yoshiki Masuyama
Natsuki Ueno
Nobutaka Ono
53
1
0
23 Jul 2023
Backdoor Attacks against Voice Recognition Systems: A Survey
Baochen Yan
Jiahe Lan
Zheng Yan
AAML
80
12
0
23 Jul 2023
SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer
Daegyeom Kim
Seong-soo Hong
Yong-Hoon Choi
79
2
0
20 Jul 2023
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis
Ziyue Jiang
Jinglin Liu
Yi Ren
Jinzheng He
Zhe Ye
...
Pengfei Wei
Chunfeng Wang
Xiang Yin
Zejun Ma
Zhou Zhao
120
52
0
14 Jul 2023
Controllable Emphasis with zero data for text-to-speech
Arnaud Joly
M. Nicolis
Ekaterina Peterova
Alessandro Lombardi
Ammar Abbas
...
Mateusz Lajszczak
Penny Karanasou
Antonio Bonafonte
Thomas Drugman
Elena Sokolova
65
1
0
13 Jul 2023
RobustL2S: Speaker-Specific Lip-to-Speech Synthesis exploiting Self-Supervised Representations
Neha Sahipjohn
Neil Shah
Vishal Tambrahalli
Vineet Gandhi
98
2
0
03 Jul 2023
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading
Yujia Xiao
Shaofei Zhang
Xi Wang
Xuejiao Tan
Lei He
Sheng Zhao
Frank Soong
Tan Lee
46
6
0
03 Jul 2023
An End-to-End Multi-Module Audio Deepfake Generation System for ADD Challenge 2023
Sheng Zhao
Qi-ping Yuan
Yibo Duan
Zhuo Chen
41
2
0
03 Jul 2023
SysNoise: Exploring and Benchmarking Training-Deployment System Inconsistency
Yan Wang
Yuhang Li
Ruihao Gong
Aishan Liu
Yanfei Wang
...
Yongqiang Yao
Yunchen Zhang
Tianzi Xiao
F. Yu
Xianglong Liu
AAML
76
0
0
01 Jul 2023
Singing Voice Synthesis Using Differentiable LPC and Glottal-Flow-Inspired Wavetables
Chin-Yun Yu
Gyorgy Fazekas
62
7
0
29 Jun 2023
High-Quality Automatic Voice Over with Accurate Alignment: Supervision through Self-Supervised Discrete Speech Units
Junchen Lu
Berrak Sisman
Mingyang Zhang
Haizhou Li
85
4
0
29 Jun 2023
EmoSpeech: Guiding FastSpeech2 Towards Emotional Text to Speech
Daria Diatlova
V. Shutov
93
9
0
28 Jun 2023
UnitSpeech: Speaker-adaptive Speech Synthesis with Untranscribed Data
Heeseung Kim
Sungwon Kim
Ji-Ran Yeom
Sung-Wan Yoon
DiffM
78
22
0
28 Jun 2023
Large-scale unsupervised audio pre-training for video-to-speech synthesis
Triantafyllos Kefalas
Yannis Panagakis
Maja Pantic
VGen
71
4
0
27 Jun 2023
The Singing Voice Conversion Challenge 2023
Wen-Chin Huang
Lester Phillip Violeta
Songxiang Liu
Jiatong Shi
Tomoki Toda
118
50
0
26 Jun 2023
DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech
Sen Liu
Yiwei Guo
Chenpeng Du
Xie Chen
Kai Yu
88
6
0
25 Jun 2023
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Matt Le
Apoorv Vyas
Bowen Shi
Brian Karrer
Leda Sari
...
Mary Williamson
Vimal Manohar
Yossi Adi
Jay Mahadeokar
Wei-Ning Hsu
AuLLM
130
306
0
23 Jun 2023
Strategies in Transfer Learning for Low-Resource Speech Synthesis: Phone Mapping, Features Input, and Source Language Selection
P. Do
Matt Coler
J. Dijkstra
E. Klabbers
37
3
0
21 Jun 2023
eCat: An End-to-End Model for Multi-Speaker TTS & Many-to-Many Fine-Grained Prosody Transfer
Ammar Abbas
S. Karlapati
Bastian Schnell
Penny Karanasou
M. G. Moya
Amith Nagaraj
Ayman Boustati
Nicole Peinelt
Alexis Moinet
Thomas Drugman
122
3
0
20 Jun 2023
Low-Resource Text-to-Speech Using Specific Data and Noise Augmentation
K. Lakshminarayana
C. Dittmar
N. Pia
Emanuel Habets
53
0
0
16 Jun 2023
Acoustic Identification of Ae. aegypti Mosquitoes using Smartphone Apps and Residual Convolutional Neural Networks
K. Paim
Ricardo Rohweder
M. R. Mendoza
R. Mansilha
Weverton Cordeiro
62
5
0
16 Jun 2023
Power-law Dynamic arising from machine learning
Wei Chen
Weitao Du
Zhi-Ming Ma
Qi Meng
33
0
0
16 Jun 2023
Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis
Shivam Mehta
Siyang Wang
Simon Alexanderson
Jonas Beskow
Éva Székely
G. Henter
DiffM
103
14
0
15 Jun 2023
Towards Building Voice-based Conversational Recommender Systems: Datasets, Potential Solutions, and Prospects
Xinghua Qu
Hongyang Liu
Zhu Sun
Xiang Yin
Yew-Soon Ong
Lu Lu
Zejun Ma
116
3
0
14 Jun 2023
Previous
1
2
3
...
5
6
7
...
24
25
26
Next