v1v2v3 (latest)

Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning

20 October 2017

Sharan Narang

Papers citing "Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning"

50 / 170 papers shown

Title
Step-Audio 2 Technical Report Boyong Wu Chao Yan Chen Hu Cheng Yi Chengli Feng ... Yuanwei Lu Yuchu Luo Yuhe Yin Yumeng Zhan Y. Zhang AuLLM 175 29 0 22 Jul 2025
TTSOps: A Closed-Loop Corpus Optimization Framework for Training Multi-Speaker TTS Models from Dark DataIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025 Kentaro Seki Shinnosuke Takamichi Takaaki Saeki Hiroshi Saruwatari 202 2 0 18 Jun 2025
CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training Zhihao Du Changfeng Gao Yuxuan Wang Fan Yu Tianyu Zhao ... Mengzhe Chen Yafeng Chen Shiliang Zhang Wen Wang Jieping Ye AuLLM 262 44 0 23 May 2025
DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio SynthesisIEEE Access (IEEE Access), 2025 Zeeshan Ahmad Shudi Bao Meng Chen 149 0 0 14 May 2025
AMNet: An Acoustic Model Network for Enhanced Mandarin Speech Synthesis Yubing Cao Yinfeng Yu Yongming Li Liejun Wang 123 0 0 12 Apr 2025
A quest through interconnected datasets: lessons from highly-cited ICASSP papersInternational Conference on Content-Based Multimedia Indexing (CBMI), 2024 Cynthia C. S. Liem Doğa Taşcılar Andrew M. Demetriou 136 0 0 19 Sep 2024
Exploring the Benefits of Tokenization of Discrete Acoustic UnitsInterspeech (Interspeech), 2024 Avihu Dekel Raul Fernandez 115 3 0 08 Jun 2024
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech ModelInterspeech (Interspeech), 2024 Edresson Casanova Kelly Davis Eren Golge Görkem Göknar Iulian Gulea ... Aya Aljafari Joshua Meyer Reuben Morais Samuel Olayemi Julian Weber VLM 180 189 0 07 Jun 2024
CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models Xiang Li Fan Bu Ambuj Mehrish Yingting Li Jiale Han Bo Cheng Soujanya Poria DiffM 125 9 0 31 Mar 2024
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models Zeqian Ju Yuancheng Wang Kai Shen Xu Tan Detai Xin ... Shikun Zhang Jiang Bian Lei He Jinyu Li Sheng Zhao DiffM 334 285 0 05 Mar 2024
Towards Accurate Lip-to-Speech Synthesis in-the-Wild Sindhu B. Hegde Rudrabha Mukhopadhyay C. V. Jawahar Vinay P. Namboodiri 96 12 0 02 Mar 2024
Detecting Voice Cloning Attacks via Timbre Watermarking Chang-rui Liu Jie Zhang Tianwei Zhang Xi Yang Weiming Zhang Neng H. Yu 204 58 0 06 Dec 2023
ELF: Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis Jungil Kong Junmo Lee Jeongmin Kim Beomjeong Kim Jihoon Park Dohee Kong Changheon Lee Sangjin Kim 238 3 0 20 Nov 2023
Style Description based Text-to-Speech with Conditional Prosodic Layer Normalization based Diffusion GAN Neeraj Kumar Ankur Narang Brejesh Lall DiffM 122 0 0 27 Oct 2023
Prosody Analysis of AudiobooksInternational Computer Science Conference (ICSC), 2023 Charuta Pethe Yunting Yin Felix D Childress Yunting Yin Steven Skiena 207 2 0 10 Oct 2023
Generative Spoken Language Model based on continuous word-sized audio tokensConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 Robin Algayres Yossi Adi Tu Nguyen Jade Copet Gabriel Synnaeve Benoît Sagot Emmanuel Dupoux AuLLM 219 18 0 08 Oct 2023
U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice CloningIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023 Tao Li Zhichao Wang Xinfa Zhu Jian Cong Qiao Tian Yuping Wang Lei Xie DiffM 167 7 0 06 Oct 2023
Sparks of Large Audio Models: A Survey and Outlook S. Latif Moazzam Shoukat Fahad Shamshad Muhammad Usama Yi Ren ... Wenwu Wang Xulong Zhang Roberto Togneri Xiaoshi Zhong Björn W. Schuller LM&MA AuLLM 513 49 0 24 Aug 2023
Accurate synthesis of Dysarthric Speech for ASR data augmentationSpeech Communication (Speech Commun.), 2023 M. Soleymanpour Michael T. Johnson Rahim Soleymanpour J. Berry 164 13 0 16 Aug 2023
Uncovering the Deceptions: An Analysis on Audio Spoofing Detection and Future ProspectsInternational Joint Conference on Artificial Intelligence (IJCAI), 2023 Rishabh Ranjan Mayank Vatsa Richa Singh 175 6 0 13 Jul 2023
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph ReadingInterspeech (Interspeech), 2023 Yujia Xiao Shaofei Zhang Xi Wang Xuejiao Tan Lei He Sheng Zhao Frank Soong Tan Lee 138 9 0 03 Jul 2023
A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI Chenshuang Zhang Chaoning Zhang Sheng Zheng Mengchun Zhang Maryam Qamar Sung-Ho Bae In So Kweon DiffM MedIm 208 104 0 23 Mar 2023
Transformers in Speech Processing: A Survey S. Latif Aun Zaidi Heriberto Cuayáhuitl Fahad Shamshad Moazzam Shoukat Muhammad Usama Junaid Qadir 388 65 0 21 Mar 2023
Emphasizing Unseen Words: New Vocabulary Acquisition for End-to-End Speech RecognitionNeural Networks (Neural Netw.), 2023 Leyuan Qu C. Weber S. Wermter 128 12 0 20 Feb 2023
Towards Building Text-To-Speech Systems for the Next Billion UsersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022 Gokul Karthik Kumar V. PraveenS. Pratyush Kumar Mitesh M. Khapra Karthik Nandakumar 191 27 0 17 Nov 2022
Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New SpeakersInterspeech (Interspeech), 2022 Cheng-Ping Hsieh Subhankar Ghosh Boris Ginsburg 198 22 0 01 Nov 2022
Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection Kentaro Seki Shinnosuke Takamichi Takaaki Saeki Hiroshi Saruwatari 225 12 0 26 Oct 2022
The Sound of Silence: Efficiency of First Digit Features in Synthetic Audio DetectionInternational Workshop on Information Forensics and Security (WIFS), 2022 Daniele Mari Federica Latora Simone Milani 89 12 0 06 Oct 2022
Speech Synthesis with Mixed EmotionsIEEE Transactions on Affective Computing (IEEE TAC), 2022 Kun Zhou Berrak Sisman R. Rana B.W.Schuller Haizhou Li 280 60 0 11 Aug 2022
Detecting Individual Decision-Making Style: Exploring Behavioral Stylometry in ChessNeural Information Processing Systems (NeurIPS), 2022 Reid McIlroy-Young Russell Wang Siddhartha Sen Jon M. Kleinberg Ashton Anderson 90 27 0 02 Aug 2022
SoundChoice: Grapheme-to-Phoneme Models with Semantic DisambiguationInterspeech (Interspeech), 2022 Artem Ploujnikov Mirco Ravanelli 74 20 0 27 Jul 2022
Controllable Data Generation by Deep Learning: A ReviewACM Computing Surveys (ACM CSUR), 2022 Shiyu Wang Yuanqi Du Xiaojie Guo Bo Pan Zhaohui Qin Bo Pan 493 38 0 19 Jul 2022
Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech Zhengxi Liu Qiao Tian Chenxu Hu Xudong Liu Meng-Che Wu Yuping Wang Hang Zhao Yuxuan Wang 118 12 0 13 Jul 2022
Show Me Your Face, And I'll Tell You How You Speak Christen Millerdurai L. A. Khaliq Timon Ulrich CVBM 184 1 0 28 Jun 2022
Searching Similarity Measure for Binarized Neural Networks Yanfei Li Ang Li Huimin Yu 119 0 0 05 Jun 2022
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level QualityIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022 Xu Tan Jiawei Chen Haohe Liu Jian Cong Chen Zhang ... Lei He Frank Soong Tao Qin Sheng Zhao Tie-Yan Liu 289 282 0 09 May 2022
A survey on attention mechanisms for medical applications: are we moving towards better algorithms?IEEE Access (IEEE Access), 2022 Tiago Gonçalves Isabel Rio-Torto Luís F. Teixeira J. S. Cardoso OOD MedIm 176 51 0 26 Apr 2022
Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-SpeechInterspeech (Interspeech), 2022 Jaesung Bae Jinhyeok Yang Taejun Bak Young-Sun Joo DiffM 221 6 0 08 Apr 2022
Heterogeneous Target Speech SeparationInterspeech (Interspeech), 2022 Hyunjae Cho Wonbin Jung Junhyeok Lee Paris Smaragdis Sanghyun Woo 136 32 0 07 Apr 2022
Self-supervised learning for robust voice cloningInterspeech (Interspeech), 2022 Konstantinos Klapsas Nikolaos Ellinas Karolos Nikitaras G. Vamvoukakis Panos Kakoulidis ... S. Raptis June Sig Sung Gunu Jho Aimilios Chalamandaris Pirros Tsiakoulis SSL 147 7 0 07 Apr 2022
Residual-guided Personalized Speech Synthesis based on Face ImageIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022 Jianrong Wang Zixuan Wang Xiaosheng Hu Xuewei Li Qiang Fang Li Liu CVBM 108 20 0 01 Apr 2022
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot ScenariosInterspeech (Interspeech), 2022 Yihan Wu Xu Tan Bohan Li Lei He Sheng Zhao Ruihua Song Tao Qin Tie-Yan Liu VLM DiffM 177 75 0 01 Apr 2022
WavThruVec: Latent speech representation as intermediate features for neural speech synthesisInterspeech (Interspeech), 2022 Hubert Siuzdak Piotr Dura Pol van Rijn Nori Jacoby AI4TS 323 37 0 31 Mar 2022
Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech RecognitionInterspeech (Interspeech), 2022 Junrui Ni Liming Wang Heting Gao Kaizhi Qian Yang Zhang Shiyu Chang M. Hasegawa-Johnson 112 27 0 29 Mar 2022
Vocal effort modeling in neural TTS for improving the intelligibility of synthetic speech in noiseInterspeech (Interspeech), 2022 T. Raitio Petko N. Petkov Jiangchuan Li M. Shifas Andrea Davis Y. Stylianou 95 3 0 20 Mar 2022
Real time spectrogram inversion on mobile phoneInterspeech (Interspeech), 2022 Oleg Rybakov Marco Tagliasacchi Yunpeng Li Liyang Jiang Xia Zhang Fadi Biadsy 438 5 0 01 Mar 2022
Revisiting Over-Smoothness in Text to SpeechAnnual Meeting of the Association for Computational Linguistics (ACL), 2022 Yi Ren Xu Tan Tao Qin Zhou Zhao Tie-Yan Liu 188 70 0 26 Feb 2022
ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-SpeechIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022 Yi Ren Ming Lei Zhiying Huang Shi-Rui Zhang Qian Chen Zhijie Yan Zhou Zhao 152 49 0 16 Feb 2022
Synthesizing Dysarthric Speech Using Multi-talker TTS for Dysarthric Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022 M. Soleymanpour Michael T. Johnson Rahim Soleymanpour J. Berry 170 46 0 27 Jan 2022
A two-step backward compatible fullband speech enhancement systemIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022 Xu Zhang Lianwu Chen Xiguang Zheng Xinlei Ren Chen Zhang Liang Guo Bin Yu 227 6 0 26 Jan 2022

All Papers

Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning

Papers citing "Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning"