v1v2v3 (latest)

Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning

20 October 2017

Sharan Narang

Papers citing "Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning"

50 / 170 papers shown

Step-Audio 2 Technical Report

...

291

22 Jul 2025

TTSOps: A Closed-Loop Corpus Optimization Framework for Training Multi-Speaker TTS Models from Dark DataIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025

311

18 Jun 2025

CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training

...

342

23 May 2025

DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio SynthesisIEEE Access (IEEE Access), 2025

Zeeshan Ahmad

Shudi Bao

Meng Chen

222

14 May 2025

AMNet: An Acoustic Model Network for Enhanced Mandarin Speech Synthesis

167

12 Apr 2025

A quest through interconnected datasets: lessons from highly-cited ICASSP papersInternational Conference on Content-Based Multimedia Indexing (CBMI), 2024

Cynthia C. S. Liem

Doğa Taşcılar

Andrew M. Demetriou

191

19 Sep 2024

Exploring the Benefits of Tokenization of Discrete Acoustic UnitsInterspeech (Interspeech), 2024

Avihu Dekel

Raul Fernandez

161

08 Jun 2024

XTTS: a Massively Multilingual Zero-Shot Text-to-Speech ModelInterspeech (Interspeech), 2024

...

276

205

07 Jun 2024

CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models

148

31 Mar 2024

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Yuancheng Wang

Xu Tan

...

Jiang Bian

435

288

05 Mar 2024

Towards Accurate Lip-to-Speech Synthesis in-the-Wild

Sindhu B. Hegde

Rudrabha Mukhopadhyay

C. V. Jawahar

Vinay P. Namboodiri

192

02 Mar 2024

Detecting Voice Cloning Attacks via Timbre Watermarking

276

06 Dec 2023

ELF: Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis

296

20 Nov 2023

Style Description based Text-to-Speech with Conditional Prosodic Layer Normalization based Diffusion GAN

174

27 Oct 2023

Prosody Analysis of AudiobooksInternational Computer Science Conference (ICSC), 2023

Charuta Pethe

Yunting Yin

Felix D Childress

Yunting Yin

Steven Skiena

297

10 Oct 2023

Generative Spoken Language Model based on continuous word-sized audio tokensConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Yossi Adi

271

08 Oct 2023

U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice CloningIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

Jian Cong

Lei Xie

183

06 Oct 2023

Sparks of Large Audio Models: A Survey and Outlook

...

Björn W. Schuller

677

24 Aug 2023

Accurate synthesis of Dysarthric Speech for ASR data augmentationSpeech Communication (Speech Commun.), 2023

201

16 Aug 2023

Uncovering the Deceptions: An Analysis on Audio Spoofing Detection and Future ProspectsInternational Joint Conference on Artificial Intelligence (IJCAI), 2023

Rishabh Ranjan

Mayank Vatsa

Richa Singh

203

13 Jul 2023

ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph ReadingInterspeech (Interspeech), 2023

178

03 Jul 2023

A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI

Mengchun Zhang

In So Kweon

268

105

23 Mar 2023

Transformers in Speech Processing: A Survey

448

21 Mar 2023

Emphasizing Unseen Words: New Vocabulary Acquisition for End-to-End Speech RecognitionNeural Networks (Neural Netw.), 2023

Leyuan Qu

C. Weber

S. Wermter

146

20 Feb 2023

Towards Building Text-To-Speech Systems for the Next Billion UsersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Mitesh M. Khapra

285

17 Nov 2022

Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New SpeakersInterspeech (Interspeech), 2022

Cheng-Ping Hsieh

Subhankar Ghosh

Boris Ginsburg

233

01 Nov 2022

Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection

Kentaro Seki

Shinnosuke Takamichi

Takaaki Saeki

Hiroshi Saruwatari

283

26 Oct 2022

The Sound of Silence: Efficiency of First Digit Features in Synthetic Audio DetectionInternational Workshop on Information Forensics and Security (WIFS), 2022

Daniele Mari

Federica Latora

Simone Milani

110

06 Oct 2022

Speech Synthesis with Mixed EmotionsIEEE Transactions on Affective Computing (IEEE TAC), 2022

Haizhou Li

317

11 Aug 2022

Detecting Individual Decision-Making Style: Exploring Behavioral Stylometry in ChessNeural Information Processing Systems (NeurIPS), 2022

123

02 Aug 2022

SoundChoice: Grapheme-to-Phoneme Models with Semantic DisambiguationInterspeech (Interspeech), 2022

Artem Ploujnikov

Mirco Ravanelli

27 Jul 2022

Controllable Data Generation by Deep Learning: A ReviewACM Computing Surveys (ACM CSUR), 2022

649

19 Jul 2022

Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech

Hang Zhao

Yuxuan Wang

159

13 Jul 2022

Show Me Your Face, And I'll Tell You How You Speak

220

28 Jun 2022

Searching Similarity Measure for Binarized Neural Networks

Yanfei Li

Ang Li

Huimin Yu

132

05 Jun 2022

NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level QualityIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022

Xu Tan

Jian Cong

...

319

290

09 May 2022

A survey on attention mechanisms for medical applications: are we moving towards better algorithms?IEEE Access (IEEE Access), 2022

208

26 Apr 2022

Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-SpeechInterspeech (Interspeech), 2022

273

08 Apr 2022

Heterogeneous Target Speech SeparationInterspeech (Interspeech), 2022

185

07 Apr 2022

Self-supervised learning for robust voice cloningInterspeech (Interspeech), 2022

...

Aimilios Chalamandaris

Pirros Tsiakoulis

SSL

207

07 Apr 2022

Residual-guided Personalized Speech Synthesis based on Face ImageIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

113

01 Apr 2022

AdaSpeech 4: Adaptive Text to Speech in Zero-Shot ScenariosInterspeech (Interspeech), 2022

Xu Tan

204

01 Apr 2022

WavThruVec: Latent speech representation as intermediate features for neural speech synthesisInterspeech (Interspeech), 2022

349

31 Mar 2022

Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech RecognitionInterspeech (Interspeech), 2022

Kaizhi Qian

137

29 Mar 2022

Vocal effort modeling in neural TTS for improving the intelligibility of synthetic speech in noiseInterspeech (Interspeech), 2022

103

20 Mar 2022

Real time spectrogram inversion on mobile phoneInterspeech (Interspeech), 2022

467

01 Mar 2022

Revisiting Over-Smoothness in Text to SpeechAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Yi Ren

Xu Tan

Tao Qin

Zhou Zhao

Tie-Yan Liu

199

26 Feb 2022

ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-SpeechIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Qian Chen

Zhou Zhao

183

16 Feb 2022

Synthesizing Dysarthric Speech Using Multi-talker TTS for Dysarthric Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

197

27 Jan 2022

A two-step backward compatible fullband speech enhancement systemIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

251

26 Jan 2022