Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.05884
Cited By
v1
v2 (latest)
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
16 December 2017
Jonathan Shen
Ruoming Pang
Ron J. Weiss
M. Schuster
Navdeep Jaitly
Zongheng Yang
Zhiwen Chen
Yu Zhang
Yuxuan Wang
RJ Skerry-Ryan
Rif A. Saurous
Yannis Agiomyrgiannakis
Yonghui Wu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions"
50 / 1,276 papers shown
Title
RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching
Hyun Joon Park
Jeongmin Liu
Jin Sob Kim
Jeong Yeol Yang
Sung Won Han
Eunwoo Song
15
0
0
20 Jun 2025
TTSOps: A Closed-Loop Corpus Optimization Framework for Training Multi-Speaker TTS Models from Dark Data
Kentaro Seki
Shinnosuke Takamichi
Takaaki Saeki
Hiroshi Saruwatari
38
0
0
18 Jun 2025
ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
Han Zhu
Wei Kang
Zengwei Yao
Liyong Guo
Fangjun Kuang
Zhaoqing Li
Weiji Zhuang
Long Lin
Daniel Povey
52
0
0
16 Jun 2025
Superposed Parameterised Quantum Circuits
Viktoria Patapovich
Mo Kordzanganeh
A. Melnikov
23
0
0
10 Jun 2025
XMAD-Bench: Cross-Domain Multilingual Audio Deepfake Benchmark
Ioan-Paul Ciobanu
Andrei Iulian Hiji
Nicolae-Cătălin Ristea
Paul Irofti
Cristian Rusu
Radu Tudor Ionescu
32
0
0
31 May 2025
Voice Adaptation for Swiss German
Samuel Stucki
Jan Deriu
Mark Cieliebak
28
0
0
28 May 2025
A Linguistically Motivated Analysis of Intonational Phrasing in Text-to-Speech Systems: Revealing Gaps in Syntactic Sensitivity
Charlotte Pouw
Afra Alishahi
Willem H. Zuidema
28
0
0
28 May 2025
VoiceMark: Zero-Shot Voice Cloning-Resistant Watermarking Approach Leveraging Speaker-Specific Latents
Haiyun Li
Zhiyong Wu
Xiaofeng Xie
Jingran Xie
Yaoxun Xu
Hanyang Peng
45
0
0
27 May 2025
STOPA: A Database of Systematic VariaTion Of DeePfake Audio for Open-Set Source Tracing and Attribution
Anton Firc
Manasi Chibber
Jagabandhu Mishra
Vishwanath Pratap Singh
Tomi Kinnunen
K. Malinka
167
0
0
26 May 2025
Novel Loss-Enhanced Universal Adversarial Patches for Sustainable Speaker Privacy
Elvir Karimov
Alexander Varlamov
Danil Ivanov
Dmitrii Korzh
Oleg Y. Rogov
AAML
39
0
0
26 May 2025
MPE-TTS: Customized Emotion Zero-Shot Text-To-Speech Using Multi-Modal Prompt
Zhichao Wu
Yueteng Kang
Songjun Cao
Long Ma
Qiulin Li
Qun Yang
DiffM
57
0
0
24 May 2025
CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training
Zhihao Du
Changfeng Gao
Yuxuan Wang
Fan Yu
Tianyu Zhao
...
Mengzhe Chen
Yafeng Chen
Shiliang Zhang
Wen Wang
Jieping Ye
AuLLM
145
1
0
23 May 2025
Differentiable K-means for Fully-optimized Discrete Token-based ASR
Kentaro Onda
Yosuke Kashiwagi
E. Tsunoo
Hayato Futami
Shinji Watanabe
71
0
0
22 May 2025
Prosodically Enhanced Foreign Accent Simulation by Discrete Token-based Resynthesis Only with Native Speech Corpora
Kentaro Onda
Keisuke Imoto
Satoru Fukayama
Daisuke Saito
Nobuaki Minematsu
26
0
0
22 May 2025
More-than-Human Storytelling: Designing Longitudinal Narrative Engagements with Generative AI
Émilie Fabre
Katie Seaborn
Shuta Koiwai
Mizuki Watanabe
Paul Riesch
AI4CE
41
1
0
20 May 2025
FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation
Yutong Liu
Ziyue Zhang
Ban Ma-bao
Yuqing Cai
Yongbin Yu
Renzeng Duojie
Xiangxiang Wang
Fan Gao
Cheng Huang
Nyima Tashi
63
1
0
20 May 2025
DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis
Zeeshan Ahmad
Shudi Bao
Meng Chen
56
0
0
14 May 2025
Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications
Biel Tura Vecino
Adam Gabry's
Daniel Mątwicki
Andrzej Pomirski
Tom Iddon
Marius Cotescu
Jaime Lorenzo-Trueba
199
3
0
12 May 2025
On the Cost and Benefits of Training Context with Utterance or Full Conversation Training: A Comparative Stud
Hyouin Liu
Zhikuan Zhang
70
0
0
12 May 2025
Teochew-Wild: The First In-the-wild Teochew Dataset with Orthographic Annotations
Linrong Pan
Chenglong Jiang
Gaoze Hou
Ying Gao
108
0
0
08 May 2025
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
114
0
0
01 May 2025
Generative Adversarial Network based Voice Conversion: Techniques, Challenges, and Recent Advancements
Sandipan Dhar
N. D. Jana
Swagatam Das
79
0
0
27 Apr 2025
Muyan-TTS: A Trainable Text-to-Speech Model Optimized for Podcast Scenarios with a
50
K
B
u
d
g
e
t
50K Budget
50
K
B
u
d
g
e
t
Xin Li
Kaikai Jia
Hao Sun
Jun Dai
Z. L. Jiang
433
0
0
27 Apr 2025
FADEL: Uncertainty-aware Fake Audio Detection with Evidential Deep Learning
Ju Yeon Kang
J. Yoon
Semin Kim
Min Hyun Han
Nam Soo Kim
105
0
0
22 Apr 2025
Using Phonemes in cascaded S2S translation pipeline
Rene Pilz
Johannes Schneider
84
0
0
22 Apr 2025
DialogueAgents: A Hybrid Agent-Based Speech Synthesis Framework for Multi-Party Dialogue
Xuzhao Li
Duyi Pan
Hongru Xiao
Jiawei Han
Jing Tang
Jiabao Ma
Wenjie Wang
Bo Cheng
70
1
0
20 Apr 2025
Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Yifan Yang
Shixuan Liu
Jiajian Li
Yuxuan Hu
Haibin Wu
...
Haiyang Sun
Yanqing Liu
Yan Lu
Kai Yu
Xie Chen
118
1
0
14 Apr 2025
AMNet: An Acoustic Model Network for Enhanced Mandarin Speech Synthesis
Yubing Cao
Yinfeng Yu
Yongming Li
Liejun Wang
67
0
0
12 Apr 2025
Generalized Multilingual Text-to-Speech Generation with Language-Aware Style Adaptation
Haowei Lou
Hye-Young Paik
Sheng Li
Wen Hu
Lina Yao
70
1
0
11 Apr 2025
On The Landscape of Spoken Language Models: A Comprehensive Survey
Siddhant Arora
Kai-Wei Chang
Chung-Ming Chien
Yifan Peng
Haibin Wu
Yossi Adi
Emmanuel Dupoux
Hung-yi Lee
Karen Livescu
Shinji Watanabe
155
14
0
11 Apr 2025
SlimSpeech: Lightweight and Efficient Text-to-Speech with Slim Rectified Flow
Kaidi Wang
Wenhao Guan
Shenghui Lu
Jianglong Yao
Lin Li
Q. Hong
187
3
0
10 Apr 2025
P2Mark: Plug-and-play Parameter-level Watermarking for Neural Speech Generation
Yong Ren
Jiangyan Yi
Tao Wang
J. Tao
Zhengqi Wen
Chenxing Li
Zheng Lian
Ruibo Fu
Ye Bai
Xiaohui Zhang
102
0
0
07 Apr 2025
ReverBERT: A State Space Model for Efficient Text-Driven Speech Style Transfer
Michael Brown
Sofia Martinez
Priya Singh
72
0
0
26 Mar 2025
DiffCSS: Diverse and Expressive Conversational Speech Synthesis with Diffusion Models
Weihao Wu
Zhiwei Lin
Yixuan Zhou
Jingbei Li
Rui Niu
Qinghua Wu
Songjun Cao
Long Ma
Zhiyong Wu
DiffM
86
0
0
27 Feb 2025
Clip-TTS: Contrastive Text-content and Mel-spectrogram, A High-Quality Text-to-Speech Method based on Contextual Semantic Understanding
Tianyun Liu
CLIP
VLM
105
0
0
26 Feb 2025
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
Ziyue Jiang
Yi Ren
Ruiqi Li
Shengpeng Ji
Zhenhui Ye
...
Yanzhe Zhang
Rui Liu
Xiang Yin
Zhou Zhao
Zhou Zhao
153
0
0
26 Feb 2025
PASS: Presentation Automation for Slide Generation and Speech
Tushar Aggarwal
Aarohi Bhand
107
1
0
17 Jan 2025
Retrieval-Augmented Dialogue Knowledge Aggregation for Expressive Conversational Speech Synthesis
Rui Liu
Zhenqi Jia
F. Bao
Hong Li
77
2
0
11 Jan 2025
TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer
Vladimir Bataev
Subhankar Ghosh
Vitaly Lavrukhin
Jason Chun Lok Li
AI4TS
116
1
0
10 Jan 2025
AccentBox: Towards High-Fidelity Zero-Shot Accent Generation
Jinzuomu Zhong
Korin Richmond
Zhiba Su
Siqi Sun
122
6
0
10 Jan 2025
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
Ji-Hoon Kim
Hong-Sun Yang
Yoon-Cheol Ju
Il-Hwan Kim
Byeong-Yeol Kim
Joon Son Chung
BDL
120
0
0
31 Dec 2024
Generative Landmarks Guided Eyeglasses Removal 3D Face Reconstruction
Dapeng Zhao
Yue Qi
3DH
CVBM
3DV
103
1
0
31 Dec 2024
Revealing the Self: Brainwave-Based Human Trait Identification
M. Islam
Md Nahiyan Uddin
Maoyejatun Hasana
Debojit Pandit
Nafis Mahmud Rahman
Sriram Chellappan
Sami Azam
A. Islam
48
0
0
26 Dec 2024
DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles
Jiaxuan Liu
Zhaoci Liu
Yihan Hu
Yingying Gao
Shilei Zhang
Zhenhua Ling
DiffM
115
2
0
04 Dec 2024
Analytic Study of Text-Free Speech Synthesis for Raw Audio using a Self-Supervised Learning Model
Joonyong Park
Daisuke Saito
Nobuaki Minematsu
119
0
0
04 Dec 2024
VQalAttent: a Transparent Speech Generation Pipeline based on Transformer-learned VQ-VAE Latent Space
Armani Rodriguez
S. Kokalj-Filipovic
101
1
0
22 Nov 2024
ESTVocoder: An Excitation-Spectral-Transformed Neural Vocoder Conditioned on Mel Spectrogram
Xiao-Hang Jiang
Hui-Peng Du
Yang Ai
Ye-Xin Lu
Zhen-Hua Ling
81
0
0
18 Nov 2024
RDSinger: Reference-based Diffusion Network for Singing Voice Synthesis
Kehan Sui
Jinxu Xiang
Fang Jin
DiffM
45
0
0
29 Oct 2024
Mitigating Unauthorized Speech Synthesis for Voice Protection
Zhisheng Zhang
Qianyi Yang
Derui Wang
Pengyang Huang
Yuxin Cao
Kai Ye
Jie Hao
AAML
59
3
0
28 Oct 2024
Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation
Maohao Shen
Shun Zhang
Jilong Wu
Zhiping Xiu
Ehab AlBadawy
Yiting Lu
M. Seltzer
Qing He
70
2
0
27 Oct 2024
1
2
3
4
...
24
25
26
Next