Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
1710.07654
Cited By
v1
v2
v3 (latest)
Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning
20 October 2017
Ming-Yu Liu
Kainan Peng
Andrew Gibiansky
Sercan O. Arik
Ajay Kannan
Sharan Narang
Jonathan Raiman
John Miller
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning"
50 / 170 papers shown
Title
Step-Audio 2 Technical Report
Boyong Wu
Chao Yan
Chen Hu
Cheng Yi
Chengli Feng
...
Yuanwei Lu
Yuchu Luo
Yuhe Yin
Yumeng Zhan
Y. Zhang
AuLLM
175
29
0
22 Jul 2025
TTSOps: A Closed-Loop Corpus Optimization Framework for Training Multi-Speaker TTS Models from Dark Data
IEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025
Kentaro Seki
Shinnosuke Takamichi
Takaaki Saeki
Hiroshi Saruwatari
202
2
0
18 Jun 2025
CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training
Zhihao Du
Changfeng Gao
Yuxuan Wang
Fan Yu
Tianyu Zhao
...
Mengzhe Chen
Yafeng Chen
Shiliang Zhang
Wen Wang
Jieping Ye
AuLLM
262
44
0
23 May 2025
DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis
IEEE Access (IEEE Access), 2025
Zeeshan Ahmad
Shudi Bao
Meng Chen
149
0
0
14 May 2025
AMNet: An Acoustic Model Network for Enhanced Mandarin Speech Synthesis
Yubing Cao
Yinfeng Yu
Yongming Li
Liejun Wang
123
0
0
12 Apr 2025
A quest through interconnected datasets: lessons from highly-cited ICASSP papers
International Conference on Content-Based Multimedia Indexing (CBMI), 2024
Cynthia C. S. Liem
Doğa Taşcılar
Andrew M. Demetriou
136
0
0
19 Sep 2024
Exploring the Benefits of Tokenization of Discrete Acoustic Units
Interspeech (Interspeech), 2024
Avihu Dekel
Raul Fernandez
115
3
0
08 Jun 2024
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model
Interspeech (Interspeech), 2024
Edresson Casanova
Kelly Davis
Eren Golge
Görkem Göknar
Iulian Gulea
...
Aya Aljafari
Joshua Meyer
Reuben Morais
Samuel Olayemi
Julian Weber
VLM
180
189
0
07 Jun 2024
CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models
Xiang Li
Fan Bu
Ambuj Mehrish
Yingting Li
Jiale Han
Bo Cheng
Soujanya Poria
DiffM
125
9
0
31 Mar 2024
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Zeqian Ju
Yuancheng Wang
Kai Shen
Xu Tan
Detai Xin
...
Shikun Zhang
Jiang Bian
Lei He
Jinyu Li
Sheng Zhao
DiffM
334
285
0
05 Mar 2024
Towards Accurate Lip-to-Speech Synthesis in-the-Wild
Sindhu B. Hegde
Rudrabha Mukhopadhyay
C. V. Jawahar
Vinay P. Namboodiri
96
12
0
02 Mar 2024
Detecting Voice Cloning Attacks via Timbre Watermarking
Chang-rui Liu
Jie Zhang
Tianwei Zhang
Xi Yang
Weiming Zhang
Neng H. Yu
204
58
0
06 Dec 2023
ELF: Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis
Jungil Kong
Junmo Lee
Jeongmin Kim
Beomjeong Kim
Jihoon Park
Dohee Kong
Changheon Lee
Sangjin Kim
238
3
0
20 Nov 2023
Style Description based Text-to-Speech with Conditional Prosodic Layer Normalization based Diffusion GAN
Neeraj Kumar
Ankur Narang
Brejesh Lall
DiffM
122
0
0
27 Oct 2023
Prosody Analysis of Audiobooks
International Computer Science Conference (ICSC), 2023
Charuta Pethe
Yunting Yin
Felix D Childress
Yunting Yin
Steven Skiena
207
2
0
10 Oct 2023
Generative Spoken Language Model based on continuous word-sized audio tokens
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Robin Algayres
Yossi Adi
Tu Nguyen
Jade Copet
Gabriel Synnaeve
Benoît Sagot
Emmanuel Dupoux
AuLLM
219
18
0
08 Oct 2023
U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Tao Li
Zhichao Wang
Xinfa Zhu
Jian Cong
Qiao Tian
Yuping Wang
Lei Xie
DiffM
167
7
0
06 Oct 2023
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Xiaoshi Zhong
Björn W. Schuller
LM&MA
AuLLM
513
49
0
24 Aug 2023
Accurate synthesis of Dysarthric Speech for ASR data augmentation
Speech Communication (Speech Commun.), 2023
M. Soleymanpour
Michael T. Johnson
Rahim Soleymanpour
J. Berry
164
13
0
16 Aug 2023
Uncovering the Deceptions: An Analysis on Audio Spoofing Detection and Future Prospects
International Joint Conference on Artificial Intelligence (IJCAI), 2023
Rishabh Ranjan
Mayank Vatsa
Richa Singh
175
6
0
13 Jul 2023
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading
Interspeech (Interspeech), 2023
Yujia Xiao
Shaofei Zhang
Xi Wang
Xuejiao Tan
Lei He
Sheng Zhao
Frank Soong
Tan Lee
138
9
0
03 Jul 2023
A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI
Chenshuang Zhang
Chaoning Zhang
Sheng Zheng
Mengchun Zhang
Maryam Qamar
Sung-Ho Bae
In So Kweon
DiffM
MedIm
208
104
0
23 Mar 2023
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Muhammad Usama
Junaid Qadir
388
65
0
21 Mar 2023
Emphasizing Unseen Words: New Vocabulary Acquisition for End-to-End Speech Recognition
Neural Networks (Neural Netw.), 2023
Leyuan Qu
C. Weber
S. Wermter
128
12
0
20 Feb 2023
Towards Building Text-To-Speech Systems for the Next Billion Users
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Gokul Karthik Kumar
V. PraveenS.
Pratyush Kumar
Mitesh M. Khapra
Karthik Nandakumar
191
27
0
17 Nov 2022
Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers
Interspeech (Interspeech), 2022
Cheng-Ping Hsieh
Subhankar Ghosh
Boris Ginsburg
198
22
0
01 Nov 2022
Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection
Kentaro Seki
Shinnosuke Takamichi
Takaaki Saeki
Hiroshi Saruwatari
225
12
0
26 Oct 2022
The Sound of Silence: Efficiency of First Digit Features in Synthetic Audio Detection
International Workshop on Information Forensics and Security (WIFS), 2022
Daniele Mari
Federica Latora
Simone Milani
89
12
0
06 Oct 2022
Speech Synthesis with Mixed Emotions
IEEE Transactions on Affective Computing (IEEE TAC), 2022
Kun Zhou
Berrak Sisman
R. Rana
B.W.Schuller
Haizhou Li
280
60
0
11 Aug 2022
Detecting Individual Decision-Making Style: Exploring Behavioral Stylometry in Chess
Neural Information Processing Systems (NeurIPS), 2022
Reid McIlroy-Young
Russell Wang
Siddhartha Sen
Jon M. Kleinberg
Ashton Anderson
90
27
0
02 Aug 2022
SoundChoice: Grapheme-to-Phoneme Models with Semantic Disambiguation
Interspeech (Interspeech), 2022
Artem Ploujnikov
Mirco Ravanelli
74
20
0
27 Jul 2022
Controllable Data Generation by Deep Learning: A Review
ACM Computing Surveys (ACM CSUR), 2022
Shiyu Wang
Yuanqi Du
Xiaojie Guo
Bo Pan
Zhaohui Qin
Bo Pan
493
38
0
19 Jul 2022
Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech
Zhengxi Liu
Qiao Tian
Chenxu Hu
Xudong Liu
Meng-Che Wu
Yuping Wang
Hang Zhao
Yuxuan Wang
118
12
0
13 Jul 2022
Show Me Your Face, And I'll Tell You How You Speak
Christen Millerdurai
L. A. Khaliq
Timon Ulrich
CVBM
184
1
0
28 Jun 2022
Searching Similarity Measure for Binarized Neural Networks
Yanfei Li
Ang Li
Huimin Yu
119
0
0
05 Jun 2022
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Xu Tan
Jiawei Chen
Haohe Liu
Jian Cong
Chen Zhang
...
Lei He
Frank Soong
Tao Qin
Sheng Zhao
Tie-Yan Liu
289
282
0
09 May 2022
A survey on attention mechanisms for medical applications: are we moving towards better algorithms?
IEEE Access (IEEE Access), 2022
Tiago Gonçalves
Isabel Rio-Torto
Luís F. Teixeira
J. S. Cardoso
OOD
MedIm
176
51
0
26 Apr 2022
Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-Speech
Interspeech (Interspeech), 2022
Jaesung Bae
Jinhyeok Yang
Taejun Bak
Young-Sun Joo
DiffM
221
6
0
08 Apr 2022
Heterogeneous Target Speech Separation
Interspeech (Interspeech), 2022
Hyunjae Cho
Wonbin Jung
Junhyeok Lee
Paris Smaragdis
Sanghyun Woo
136
32
0
07 Apr 2022
Self-supervised learning for robust voice cloning
Interspeech (Interspeech), 2022
Konstantinos Klapsas
Nikolaos Ellinas
Karolos Nikitaras
G. Vamvoukakis
Panos Kakoulidis
...
S. Raptis
June Sig Sung
Gunu Jho
Aimilios Chalamandaris
Pirros Tsiakoulis
SSL
147
7
0
07 Apr 2022
Residual-guided Personalized Speech Synthesis based on Face Image
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Jianrong Wang
Zixuan Wang
Xiaosheng Hu
Xuewei Li
Qiang Fang
Li Liu
CVBM
108
20
0
01 Apr 2022
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios
Interspeech (Interspeech), 2022
Yihan Wu
Xu Tan
Bohan Li
Lei He
Sheng Zhao
Ruihua Song
Tao Qin
Tie-Yan Liu
VLM
DiffM
177
75
0
01 Apr 2022
WavThruVec: Latent speech representation as intermediate features for neural speech synthesis
Interspeech (Interspeech), 2022
Hubert Siuzdak
Piotr Dura
Pol van Rijn
Nori Jacoby
AI4TS
323
37
0
31 Mar 2022
Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition
Interspeech (Interspeech), 2022
Junrui Ni
Liming Wang
Heting Gao
Kaizhi Qian
Yang Zhang
Shiyu Chang
M. Hasegawa-Johnson
112
27
0
29 Mar 2022
Vocal effort modeling in neural TTS for improving the intelligibility of synthetic speech in noise
Interspeech (Interspeech), 2022
T. Raitio
Petko N. Petkov
Jiangchuan Li
M. Shifas
Andrea Davis
Y. Stylianou
95
3
0
20 Mar 2022
Real time spectrogram inversion on mobile phone
Interspeech (Interspeech), 2022
Oleg Rybakov
Marco Tagliasacchi
Yunpeng Li
Liyang Jiang
Xia Zhang
Fadi Biadsy
438
5
0
01 Mar 2022
Revisiting Over-Smoothness in Text to Speech
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Yi Ren
Xu Tan
Tao Qin
Zhou Zhao
Tie-Yan Liu
188
70
0
26 Feb 2022
ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-Speech
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Yi Ren
Ming Lei
Zhiying Huang
Shi-Rui Zhang
Qian Chen
Zhijie Yan
Zhou Zhao
152
49
0
16 Feb 2022
Synthesizing Dysarthric Speech Using Multi-talker TTS for Dysarthric Speech Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
M. Soleymanpour
Michael T. Johnson
Rahim Soleymanpour
J. Berry
170
46
0
27 Jan 2022
A two-step backward compatible fullband speech enhancement system
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Xu Zhang
Lianwu Chen
Xiguang Zheng
Xinlei Ren
Chen Zhang
Liang Guo
Bin Yu
227
6
0
26 Jan 2022
1
2
3
4
Next