Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.05884
Cited By
v1
v2 (latest)
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
16 December 2017
Jonathan Shen
Ruoming Pang
Ron J. Weiss
M. Schuster
Navdeep Jaitly
Zongheng Yang
Zhiwen Chen
Yu Zhang
Yuxuan Wang
RJ Skerry-Ryan
Rif A. Saurous
Yannis Agiomyrgiannakis
Yonghui Wu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions"
50 / 1,276 papers shown
Title
DQR-TTS: Semi-supervised Text-to-speech Synthesis with Dynamic Quantized Representation
Jiangzong Wang
Pengcheng Li
Xulong Zhang
Ning Cheng
Jing Xiao
81
0
0
14 Nov 2023
CSLP-AE: A Contrastive Split-Latent Permutation Autoencoder Framework for Zero-Shot Electroencephalography Signal Conversion
Anders Vestergaard Norskov
Alexander Neergaard Zahid
Morten Morup
65
3
0
13 Nov 2023
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
Daniel Y. Fu
Hermann Kumbong
Eric N. D. Nguyen
Christopher Ré
VLM
100
30
0
10 Nov 2023
Synthetic Speaking Children -- Why We Need Them and How to Make Them
Muhammad Ali Farooq
Dan Bigioi
Rishabh Jain
Wang Yao
Mariam Yiwere
Peter Corcoran
86
0
0
08 Nov 2023
Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning
Rishabh Jain
Peter Corcoran
53
0
0
07 Nov 2023
Transduce and Speak: Neural Transducer for Text-to-Speech with Semantic Token Prediction
Minchan Kim
Myeonghun Jeong
Byoung Jin Choi
Dongjune Lee
N. Kim
AI4TS
95
12
0
06 Nov 2023
Are cascade dialogue state tracking models speaking out of turn in spoken dialogues?
Lucas Druart
Léo Jacqmin
Benoit Favre
L. Rojas-Barahona
Valentin Vielzeuf
67
0
0
03 Nov 2023
Style Description based Text-to-Speech with Conditional Prosodic Layer Normalization based Diffusion GAN
Neeraj Kumar
Ankur Narang
Brejesh Lall
DiffM
69
0
0
27 Oct 2023
Back Transcription as a Method for Evaluating Robustness of Natural Language Understanding Models to Speech Recognition Errors
Marek Kubis
Pawel Skórzewski
Marcin Sowañski
Tomasz Ziętkiewicz
56
6
0
25 Oct 2023
Generative Pre-training for Speech with Flow Matching
Alexander H. Liu
Matt Le
Apoorv Vyas
Bowen Shi
Andros Tjandra
Wei-Ning Hsu
104
36
0
25 Oct 2023
DPP-TTS: Diversifying prosodic features of speech via determinantal point processes
Seongho Joo
Hyukhun Koh
Kyomin Jung
DiffM
95
0
0
23 Oct 2023
Acoustic BPE for Speech Generation with Discrete Tokens
Feiyu Shen
Yiwei Guo
Chenpeng Du
Xie Chen
Kai Yu
95
13
0
23 Oct 2023
An overview of text-to-speech systems and media applications
Mohammad Reza Hasanabadi
28
3
0
22 Oct 2023
Energy-Based Models For Speech Synthesis
Wanli Sun
Zehai Tu
Anton Ragni
DiffM
67
1
0
19 Oct 2023
On the Relevance of Phoneme Duration Variability of Synthesized Training Data for Automatic Speech Recognition
Nick Rossenbach
Benedikt Hilmes
Ralf Schluter
59
3
0
12 Oct 2023
Sound-skwatter (Did You Mean: Sound-squatter?) AI-powered Generator for Phishing Prevention
R. Valentim
Idilio Drago
Marco Mellia
Federico Cerutti
16
1
0
10 Oct 2023
Prosody Analysis of Audiobooks
Charuta Pethe
Yunting Yin
Felix D Childress
Yunting Yin
Steven Skiena
89
1
0
10 Oct 2023
Generative Spoken Language Model based on continuous word-sized audio tokens
Robin Algayres
Yossi Adi
Tu Nguyen
Jade Copet
Gabriel Synnaeve
Benoît Sagot
Emmanuel Dupoux
AuLLM
119
16
0
08 Oct 2023
Unified speech and gesture synthesis using flow matching
Shivam Mehta
Ruibo Tu
Simon Alexanderson
Jonas Beskow
Éva Székely
G. Henter
100
3
0
08 Oct 2023
Comparative Analysis of Transfer Learning in Deep Learning Text-to-Speech Models on a Few-Shot, Low-Resource, Customized Dataset
Ze Liu
53
1
0
08 Oct 2023
Hate Speech Detection in Limited Data Contexts using Synthetic Data Generation
Aman Khullar
Daniel K. Nkemelu
Cuong V. Nguyen
Michael L. Best
80
5
0
04 Oct 2023
DiffAR: Denoising Diffusion Autoregressive Model for Raw Speech Waveform Generation
Roi Benita
Michael Elad
Joseph Keshet
DiffM
115
8
0
02 Oct 2023
Towards human-like spoken dialogue generation between AI agents from written dialogue
Kentaro Mitsui
Yukiya Hono
Kei Sawada
88
14
0
02 Oct 2023
Low-Resource Self-Supervised Learning with SSL-Enhanced TTS
Xin Wang
Taein Kwon
Wei-Ning Hsu
Yossi Adi
Tu Nguyen
D. Bohus
Emmanuel Dupoux
Neel Joshi
Abdelrahman Mohamed
42
4
0
29 Sep 2023
Privacy-preserving and Privacy-attacking Approaches for Speech and Audio -- A Survey
Yuchen Liu
Apu Kapadia
Donald Williamson
AAML
76
0
0
26 Sep 2023
Deepfake audio as a data augmentation technique for training automatic speech to text transcription models
Alexandre R. Ferreira
Cláudio E. C. Campelo
34
1
0
22 Sep 2023
DurIAN-E: Duration Informed Attention Network For Expressive Text-to-Speech Synthesis
Yu Gu
Yianrao Bian
Guangzhi Lei
Chao Weng
Jane Polak Scowcroft
DiffM
55
2
0
22 Sep 2023
CrossSinger: A Cross-Lingual Multi-Singer High-Fidelity Singing Voice Synthesizer Trained on Monolingual Singers
Xintong Wang
Chang Zeng
Jun Chen
Chunhui Wang
71
6
0
22 Sep 2023
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts
Shunwei Lei
Yixuan Zhou
Liyang Chen
Dan Luo
Zhiyong Wu
...
Shiyin Kang
Tao Jiang
Yahui Zhou
Yuxing Han
Helen M. Meng
VLM
90
2
0
21 Sep 2023
A Discourse-level Multi-scale Prosodic Model for Fine-grained Emotion Analysis
X. Wei
Jia Jia
Xiang Li
Zhiyong Wu
Ziyi Wang
69
1
0
21 Sep 2023
The Impact of Silence on Speech Anti-Spoofing
Yuxiang Zhang
Zhuo Li
Jingze Lu
Hua Hua
Wenchao Wang
Pengyuan Zhang
80
21
0
21 Sep 2023
Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech
Rui Liu
Bin Liu
Haizhou Li
53
3
0
21 Sep 2023
SpeechAlign: a Framework for Speech Translation Alignment Evaluation
Belen Alastruey
Aleix Sant
Gerard I. Gállego
David Dale
Marta R. Costa-jussá
AuLLM
56
3
0
20 Sep 2023
Speak While You Think: Streaming Speech Synthesis During Text Generation
Avihu Dekel
Slava Shechtman
Raul Fernandez
David Haws
Zvi Kons
R. Hoory
64
9
0
20 Sep 2023
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform
Yinghao Aaron Li
Cong Han
Xilin Jiang
N. Mesgarani
101
4
0
18 Sep 2023
Speech Synthesis By Unrolling Diffusion Process using Neural Network Layers
Peter Ochieng
DiffM
56
0
0
18 Sep 2023
Cross-lingual Knowledge Distillation via Flow-based Voice Conversion for Robust Polyglot Text-To-Speech
Dariusz Piotrowski
Renard Korzeniowski
Alessio Falai
Sebastian Cygert
Kamil Pokora
Georgi Tinchev
Ziyao Zhang
K. Yanagisawa
72
1
0
15 Sep 2023
Syn-Att: Synthetic Speech Attribution via Semi-Supervised Unknown Multi-Class Ensemble of CNNs
Md Awsafur Rahman
Bishmoy Paul
Najibul Haque Sarker
Zaber Ibn Abdul Hakim
S. Fattah
Mohammad Saquib
59
3
0
15 Sep 2023
PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-to-Speech Using Natural Language Descriptions
Reo Shimizu
Ryuichi Yamamoto
Masaya Kawamura
Yuma Shirahata
Hironori Doi
Tatsuya Komatsu
Kentaro Tachibana
DiffM
95
25
0
15 Sep 2023
Diversity-based core-set selection for text-to-speech with linguistic and acoustic features
Kentaro Seki
Shinnosuke Takamichi
Takaaki Saeki
Hiroshi Saruwatari
76
4
0
15 Sep 2023
Direct Text to Speech Translation System using Acoustic Units
Victoria Mingote
Pablo Gimeno
Luis Vicente
Sameer Khurana
Antoine Laurent
J. Duret
55
4
0
14 Sep 2023
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
Soumi Maiti
Yifan Peng
Shukjae Choi
Jee-weon Jung
Xuankai Chang
Shinji Watanabe
VLM
AuLLM
125
69
0
14 Sep 2023
MASTERKEY: Practical Backdoor Attack Against Speaker Verification Systems
Hanqing Guo
Xun Chen
Junfeng Guo
Li Xiao
Qiben Yan
84
13
0
13 Sep 2023
DCTTS: Discrete Diffusion Model with Contrastive Learning for Text-to-speech Generation
Zhichao Wu
Qiulin Li
Sixing Liu
Qun Yang
70
3
0
13 Sep 2023
Distinguishing Neural Speech Synthesis Models Through Fingerprints in Speech Waveforms
Chu Yuan Zhang
Jiangyan Yi
Jianhua Tao
Chenglong Wang
Xinrui Yan
87
8
0
13 Sep 2023
CleanUNet 2: A Hybrid Speech Denoising Model on Waveform and Spectrogram
Zhifeng Kong
Ming-Yu Liu
Ambrish Dantrey
Bryan Catanzaro
51
7
0
12 Sep 2023
Cross-Utterance Conditioned VAE for Speech Generation
Yongqian Li
Cheng Yu
Guangzhi Sun
Weiqin Zu
Zheng Tian
...
Wei Pan
Chao Zhang
Jun Wang
Yang Yang
Fanglei Sun
66
2
0
08 Sep 2023
BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network
Takashi Shibuya
Yuhta Takida
Yuki Mitsufuji
71
11
0
06 Sep 2023
MuLanTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2023
Zhihang Xu
Shaofei Zhang
Xi Wang
Jiajun Zhang
Wenning Wei
Lei He
Sheng Zhao
81
2
0
06 Sep 2023
Voice Morphing: Two Identities in One Voice
Sushant Pani
Anurag Chowdhury
Morgan Sandler
Arun Ross
79
1
0
05 Sep 2023
Previous
1
2
3
4
5
6
...
24
25
26
Next