Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.05884
Cited By
v1
v2 (latest)
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
16 December 2017
Jonathan Shen
Ruoming Pang
Ron J. Weiss
M. Schuster
Navdeep Jaitly
Zongheng Yang
Zhiwen Chen
Yu Zhang
Yuxuan Wang
RJ Skerry-Ryan
Rif A. Saurous
Yannis Agiomyrgiannakis
Yonghui Wu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions"
50 / 1,276 papers shown
Title
UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding
Chenpeng Du
Yiwei Guo
Feiyu Shen
Zhijun Liu
Zheng Liang
Xie Chen
Shuai Wang
Hui Zhang
K. Yu
DiffM
106
44
0
13 Jun 2023
PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and Pause-based Prosody Modeling
Ji-Sang Hwang
Sang-Hoon Lee
Seong-Whan Lee
68
4
0
13 Jun 2023
High-Fidelity Audio Compression with Improved RVQGAN
Rithesh Kumar
Prem Seetharaman
Alejandro Luebs
I. Kumar
Kundan Kumar
126
338
0
11 Jun 2023
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias
Ziyue Jiang
Yi Ren
Zhe Ye
Jinglin Liu
Chen Zhang
...
Rongjie Huang
Chunfeng Wang
Xiang Yin
Zejun Ma
Zhou Zhao
DiffM
105
80
0
06 Jun 2023
Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis
Dengfeng Ke
Yayue Deng
Yukang Jia
Jinlong Xue
Qi Luo
Ya Li
Jianqing Sun
Jiaen Liang
Binghuai Lin
39
0
0
05 Jun 2023
Latent Optimal Paths by Gumbel Propagation for Variational Bayesian Dynamic Programming
Xinlei Niu
Christian J. Walder
J. Zhang
Charles Patrick Martin
BDL
29
0
0
05 Jun 2023
Temporal Dynamic Quantization for Diffusion Models
Junhyuk So
Jungwon Lee
Daehyun Ahn
Hyungjun Kim
Eunhyeok Park
DiffM
MQ
108
66
0
04 Jun 2023
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
Hubert Siuzdak
134
104
0
01 Jun 2023
How Generative Spoken Language Modeling Encodes Noisy Speech: Investigation from Phonetics to Syntactics
Joonyong Park
Shinnosuke Takamichi
Tomohiko Nakamura
Kentaro Seki
Detai Xin
Hiroshi Saruwatari
AuLLM
37
3
0
01 Jun 2023
The Effects of Input Type and Pronunciation Dictionary Usage in Transfer Learning for Low-Resource Text-to-Speech
P. Do
Matt Coler
J. Dijkstra
E. Klabbers
OffRL
56
0
0
01 Jun 2023
Text-to-Speech Pipeline for Swiss German -- A comparison
Tobias Bollinger
Jan Deriu
Manfred Vogel
DiffM
60
0
0
31 May 2023
XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech
L. T. Nguyen
Thinh-Le-Gia Pham
Dat Quoc Nguyen
98
14
0
31 May 2023
LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus
Yuma Koizumi
Heiga Zen
Shigeki Karita
Yifan Ding
Kohei Yatabe
Nobuyuki Morioka
M. Bacchiani
Yu Zhang
Wei Han
Ankur Bapna
114
80
0
30 May 2023
Trustworthy Sensor Fusion against Inaudible Command Attacks in Advanced Driver-Assistance System
Jiwei Guan
Lei Pan
Chen Wang
Shui Yu
Longxiang Gao
Xi Zheng
AAML
63
4
0
30 May 2023
ADAPTERMIX: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS Adaptation
Ambuj Mehrish
Abhinav Ramesh Kashyap
Yingting Li
Navonil Majumder
Soujanya Poria
75
7
0
29 May 2023
Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis
Erik Ekstedt
Siyang Wang
Éva Székely
Joakim Gustafson
Gabriel Skantze
60
8
0
29 May 2023
Stochastic Pitch Prediction Improves the Diversity and Naturalness of Speech in Glow-TTS
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
DiffM
64
4
0
28 May 2023
Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model
Xiang Li
Songxiang Liu
Max W. Y. Lam
Zhiyong Wu
Chao Weng
Helen Meng
DiffM
127
5
0
26 May 2023
VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation
Tianrui Wang
Long Zhou
Zi-Hua Zhang
Yu-Huan Wu
Shujie Liu
Yashesh Gaur
Zhuo Chen
Jinyu Li
Furu Wei
92
106
0
25 May 2023
Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration
Rustem Yeshpanov
Saida Mussakhojayeva
Yerbolat Khassanov
58
3
0
25 May 2023
Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM
Eliya Nachmani
Alon Levkovitch
Roy Hirsch
Julián Salazar
Chulayutsh Asawaroengchai
Soroosh Mariooryad
Ehud Rivlin
RJ Skerry-Ryan
Michelle Tadmor Ramanovich
AuLLM
118
45
0
24 May 2023
EfficientSpeech: An On-Device Text to Speech Model
Rowel Atienza
63
4
0
23 May 2023
U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech
Xin Jing
Yi Chang
Zijiang Yang
Jiang-jian Xie
Andreas Triantafyllopoulos
Bjoern W. Schuller
99
10
0
22 May 2023
Textually Pretrained Speech Language Models
Michael Hassid
Tal Remez
Tu Nguyen
Itai Gat
Alexis Conneau
...
Alexandre Défossez
Gabriel Synnaeve
Emmanuel Dupoux
Roy Schwartz
Yossi Adi
VLM
SyDa
129
61
0
22 May 2023
VAKTA-SETU: A Speech-to-Speech Machine Translation Service in Select Indic Languages
Shivam Mhaskar
Vineet Bhat
Akshay Batheja
S. Deoghare
Paramveer Choudhary
P. Bhattacharyya
74
5
0
21 May 2023
ComedicSpeech: Text To Speech For Stand-up Comedies in Low-Resource Scenarios
Yuyue Wang
Huanhou Xiao
Yihan Wu
Ruihua Song
43
0
0
20 May 2023
FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder With Multiple STFTs
Won Jang
D. Lim
Heayoung Park
88
1
0
18 May 2023
Fusion-S2iGan: An Efficient and Effective Single-Stage Framework for Speech-to-Image Generation
Zhenxing Zhang
Lambert Schomaker
48
3
0
17 May 2023
Integrating Generative Artificial Intelligence in Intelligent Vehicle Systems
Lukas Stappen
J. Dillmann
S. Striegel
Hans-Jörg Vögel
Nicolas Flores-Herr
Björn W. Schuller
69
9
0
15 May 2023
APNet: An All-Frame-Level Neural Vocoder Incorporating Direct Prediction of Amplitude and Phase Spectra
Yang Ai
Zhenhua Ling
101
14
0
13 May 2023
Using Deepfake Technologies for Word Emphasis Detection
Eran Kaufman
Lee-Ad Gottlieb
59
0
0
12 May 2023
VSMask: Defending Against Voice Synthesis Attack via Real-Time Predictive Perturbation
Yuanda Wang
Hanqing Guo
Guangjing Wang
Bocheng Chen
Qiben Yan
AAML
60
18
0
09 May 2023
Learn to Sing by Listening: Building Controllable Virtual Singer by Unsupervised Learning from Voice Recordings
Wei Xue
Yiwen Wang
Qi-fei Liu
Yi-Ting Guo
73
1
0
09 May 2023
Joint Multi-scale Cross-lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing
Jingbei Li
Sipan Li
Ping Chen
Lu Zhang
Yi Meng
Zhiyong Wu
Helen Meng
Qiao Tian
Yuping Wang
Yuxuan Wang
79
3
0
09 May 2023
Accented Text-to-Speech Synthesis with Limited Data
Xuehao Zhou
Mingyang Zhang
Yi Zhou
Zhizheng Wu
Haizhou Li
76
15
0
08 May 2023
Block the Label and Noise: An N-Gram Masked Speller for Chinese Spell Checking
Haiyun Yang
111
1
0
05 May 2023
Diverse and Vivid Sound Generation from Text Descriptions
Guangwei Li
Xuenan Xu
Lingfeng Dai
Mengyue Wu
K. Yu
95
4
0
03 May 2023
Glitch in the Matrix: A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization
Théophile Cabannes
Shreya Ghosh
Raphaël Marinier
Tom Gedeon
Alexandre M. Bayen
Munawar Hayat
159
29
0
03 May 2023
Environmental sound synthesis from vocal imitations and sound event labels
Yuki Okamoto
Keisuke Imoto
Shinnosuke Takamichi
Ryotaro Nagase
Takahiro Fukumori
Y. Yamashita
49
0
0
29 Apr 2023
Can deepfakes be created by novice users?
Pulak Mehta
Gauri Jagatap
Kevin Gallagher
Brian Timmerman
Progga Deb
S. Garg
Rachel Greenstadt
Brendan Dolan-Gavitt
HAI
69
4
0
28 Apr 2023
TorchBench: Benchmarking PyTorch with High API Surface Coverage
Yueming Hao
Xu Zhao
Bin Bao
David Berard
William Constable
Adnan Aziz
Xu Liu
75
8
0
27 Apr 2023
Source-Filter-Based Generative Adversarial Neural Vocoder for High Fidelity Speech Synthesis
Ye-Xin Lu
Yang Ai
Zhenhua Ling
105
1
0
26 Apr 2023
Zero-shot text-to-speech synthesis conditioned using self-supervised speech representation model
Kenichi Fujita
Takanori Ashihara
Hiroki Kanagawa
Takafumi Moriya
Yusuke Ijima
88
11
0
24 Apr 2023
SAR: Self-Supervised Anti-Distortion Representation for End-To-End Speech Model
Jianzong Wang
Xulong Zhang
Haobin Tang
Aolan Sun
Ning Cheng
Jing Xiao
129
1
0
23 Apr 2023
OLISIA: a Cascade System for Spoken Dialogue State Tracking
Léo Jacqmin
Lucas Druart
Yannick Esteve
Benoit Favre
L. Rojas-Barahona
Valentin Vielzeuf
91
3
0
20 Apr 2023
Affective social anthropomorphic intelligent system
Md. Adyelullahil Mamun
Hasnat Md. Abdullah
Md. Golam Rabiul Alam
Muhammad Mehedi Hassan
Md. Zia Uddin
52
1
0
19 Apr 2023
A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at Scale
Cal Peyser
M. Picheny
Kyunghyun Cho
Rohit Prabhavalkar
Ronny Huang
Tara N. Sainath
AI4TS
49
1
0
19 Apr 2023
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
Kai Shen
Zeqian Ju
Xu Tan
Yanqing Liu
Yichong Leng
Lei He
Tao Qin
Sheng Zhao
Jiang Bian
DiffM
115
247
0
18 Apr 2023
A Virtual Simulation-Pilot Agent for Training of Air Traffic Controllers
Juan Pablo Zuluaga
Amrutha Prasad
Iuliia Nigmatulina
P. Motlícek
Matthias Kleinert
65
23
0
16 Apr 2023
Context-aware Coherent Speaking Style Prediction with Hierarchical Transformers for Audiobook Speech Synthesis
Shunwei Lei
Yixuan Zhou
Liyang Chen
Zhiyong Wu
Shiyin Kang
Helen Meng
84
6
0
13 Apr 2023
Previous
1
2
3
...
6
7
8
...
24
25
26
Next