v1v2v3 (latest)

End-to-End Adversarial Text-to-Speech

5 June 2020

Papers citing "End-to-End Adversarial Text-to-Speech"

50 / 114 papers shown

Muyan-TTS: A Trainable Text-to-Speech Model Optimized for Podcast Scenarios with a $50K Budget

933

27 Apr 2025

P2Mark: Plug-and-play Parameter-level Watermarking for Neural Speech Generation

439

07 Apr 2025

Memory-Centric Computing: Recent Advances in Processing-in-DRAM

366

26 Dec 2024

SegINR: Segment-wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-SpeechIEEE Signal Processing Letters (SPL), 2024

Minchan Kim

Myeonghun Jeong

Joun Yeop Lee

Nam Soo Kim

221

07 Oct 2024

SSDM: Scalable Speech Dysfluency ModelingNeural Information Processing Systems (NeurIPS), 2024

Xuanru Zhou

Gopala Anumanchipalli

AuLLM

332

29 Aug 2024

Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training

Hawraz A. Ahmad

Tarik A. Rashid

293

06 Aug 2024

A Survey of Deep Learning Audio Generation Methods

Matej Bozic

Marko Horvat

VLM MedIm

351

31 May 2024

RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text

357

30 May 2024

CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models

182

31 Mar 2024

PAVITS: Exploring Prosody-aware VITS for End-to-End Emotional Voice Conversion

Wenming Zheng

184

03 Mar 2024

Amphion: An Open-Source Audio, Music and Speech Generation ToolkitSpoken Language Technology Workshop (SLT), 2023

Xueyao Zhang

Liumeng Xue

Yicheng Gu

Yuancheng Wang

Haorui He

...

Haizhou Li

366

15 Dec 2023

DQR-TTS: Semi-supervised Text-to-speech Synthesis with Dynamic Quantized Representation

385

14 Nov 2023

E3 TTS: Easy End-to-End Diffusion-based Text to SpeechAutomatic Speech Recognition & Understanding (ASRU), 2023

357

02 Nov 2023

The IMS Toucan System for the Blizzard Challenge 2023

265

26 Oct 2023

DPP-TTS: Diversifying prosodic features of speech via determinantal point processesConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

347

23 Oct 2023

An overview of text-to-speech systems and media applications

Mohammad Reza Hasanabadi

140

22 Oct 2023

DiffAR: Denoising Diffusion Autoregressive Model for Raw Speech Waveform GenerationInternational Conference on Learning Representations (ICLR), 2023

630

02 Oct 2023

FastGraphTTS: An Ultrafast Syntax-Aware Speech Synthesis FrameworkIEEE International Conference on Tools with Artificial Intelligence (ICTAI), 2023

211

16 Sep 2023

QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation LearningIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2023

191

31 Aug 2023

Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic InformationInterspeech (Interspeech), 2022

Zhiyong Wu

Shiyin Kang

Helen Meng

272

31 Aug 2023

iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNNInterspeech (Interspeech), 2023

205

14 Aug 2023

VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture DesignInterspeech (Interspeech), 2023

379

31 Jul 2023

eCat: An End-to-End Model for Multi-Speaker TTS & Many-to-Many Fine-Grained Prosody TransferInterspeech (Interspeech), 2023

346

20 Jun 2023

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language ModelsNeural Information Processing Systems (NeurIPS), 2023

Cong Han

370

241

13 Jun 2023

PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and Pause-based Prosody ModelingAsian Conference on Pattern Recognition (ACPR), 2023

Ji-Sang Hwang

Sang-Hoon Lee

Seong-Whan Lee

242

13 Jun 2023

HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion ModelsNeural Networks (Neural Netw.), 2023

228

12 Jun 2023

The Age of Synthetic Realities: Challenges and OpportunitiesAPSIPA Transactions on Signal and Information Processing (TASIP), 2023

Shiqi Wang

Anderson de Rezende Rocha

DeLMO

330

09 Jun 2023

Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias

...

Rongjie Huang

Chunfeng Wang

Xiang Yin

Zejun Ma

Zhou Zhao

DiffM

306

06 Jun 2023

Towards Robust FastSpeech 2 by Modelling Residual MultimodalityInterspeech (Interspeech), 2023

Fabian Kögel

Bac Nguyen

Fabien Cardinaux

189

02 Jun 2023

OTW: Optimal Transport Warping for Time SeriesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

224

01 Jun 2023

DC CoMix TTS: An End-to-End Expressive TTS with Discrete Code Collaborated with MixerInterspeech (Interspeech), 2023

Yerin Choi

M. Koo

442

31 May 2023

Make-A-Voice: Unified Voice Synthesis With Discrete Representation

Rongjie Huang

Dongchao Yang

Zhou Zhao

Dong Yu

DiffM

224

30 May 2023

ViT-TTS: Visual Text-to-Speech with Scalable Diffusion TransformerConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Rongjie Huang

Zhou Zhao

403

22 May 2023

CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-trainingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Rongjie Huang

Xiang Yin

Zhou Zhao

CLIP

182

18 May 2023

RMSSinger: Realistic-Music-Score based Singing Voice SynthesisAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Rongjie Huang

Zhou Zhao

293

18 May 2023

Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech SynthesisIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

221

24 Mar 2023

A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI

Mengchun Zhang

In So Kweon

339

110

23 Mar 2023

An End-to-End Neural Network for Image-to-Audio TransformationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Liu Chen

Michael Deisher

Munir Georges

193

10 Mar 2023

FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model

372

06 Mar 2023

PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS

435

24 Feb 2023

Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study

237

22 Jan 2023

SNAC: Speaker-normalized affine coupling layer in flow-based architecture for zero-shot multi-speaker text-to-speechIEEE Signal Processing Letters (SPL), 2022

223

30 Nov 2022

Deep Fake Detection, Deterrence and Response: Challenges and Opportunities

Amin Azmoodeh

Ali Dehghantanha

219

26 Nov 2022

NANSY++: Unified Voice Synthesis with Neural Analysis and SynthesisInternational Conference on Learning Representations (ICLR), 2022

Hyeong-Seok Choi

Jinhyeok Yang

Juheon Lee

Hyeongju Kim

363

17 Nov 2022

DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSPIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Jian Cong

269

02 Nov 2022

Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis

Aimilios Chalamandaris

Pirros Tsiakoulis

262

01 Nov 2022

Uncertainty-DTW for Time Series and SequencesEuropean Conference on Computer Vision (ECCV), 2022

Lei Wang

Piotr Koniusz

347

30 Oct 2022

Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech SynthesisIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

299

28 Oct 2022

Improving robustness of spontaneous speech synthesis with linguistic speech regularization and pseudo-filled-pause insertionSpeech Synthesis Workshop (SSW), 2022

Yuta Matsunaga

Takaaki Saeki

Shinnosuke Takamichi

Hiroshi Saruwatari

343

18 Oct 2022

Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech SynthesisAsia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2022

Yuta Matsunaga

Takaaki Saeki

Shinnosuke Takamichi

Hiroshi Saruwatari

295

14 Oct 2022