v1v2v3 (latest)

JDI-T: Jointly trained Duration Informed Transformer for Text-To-Speech without Explicit Alignment

15 May 2020

Papers citing "JDI-T: Jointly trained Duration Informed Transformer for Text-To-Speech without Explicit Alignment"

21 / 21 papers shown

DFADD: The Diffusion and Flow-Matching Based Audio Deepfake DatasetSpoken Language Technology Workshop (SLT), 2024

Jiawei Du

I-Ming Lin

I-Hsiang Chiu

Xuanjun Chen

Haibin Wu

Wenze Ren

Yu Tsao

Hung-yi Lee

Jyh-Shing Roger Jang

DiffM

305

13 Sep 2024

VNet: A GAN-based Multi-Tier Discriminator Network for Speech Synthesis VocodersIEEE International Conference on Systems, Man and Cybernetics (SMC), 2024

Yubing Cao

Yongming Li

Liejun Wang

Yinfeng Yu

177

13 Aug 2024

A Survey of Deep Learning Audio Generation Methods

Matej Bozic

Marko Horvat

VLM MedIm

351

31 May 2024

Intelli-Z: Toward Intelligible Zero-Shot TTS

269

25 Jan 2024

FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder With Multiple STFTsInterspeech (Interspeech), 2023

Won Jang

D. Lim

Heayoung Park

253

18 May 2023

Transformers in Speech Processing: A Survey

515

21 Mar 2023

Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-SpeechIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Zhehuai Chen

Andrew Rosenberg

Bhuvana Ramabhadran

324

27 Oct 2022

JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to SpeechInterspeech (Interspeech), 2022

D. Lim

Sunghee Jung

Eesung Kim

433

31 Mar 2022

AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable Duration ModelingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Bac Nguyen

Fabien Cardinaux

Stefan Uhlich

184

21 Mar 2022

PAMA-TTS: Progression-Aware Monotonic Attention for Stable Seq2Seq TTS With Accurate Phoneme Duration ControlIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

Yunchao He

Jian Luan

Yujun Wang

357

09 Oct 2021

PortaSpeech: Portable and High-Quality Generative Text-to-Speech

Yi Ren

Jinglin Liu

Zhou Zhao

436

30 Sep 2021

A Survey on Neural Speech Synthesis

Xu Tan

470

446

29 Jun 2021

UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform GenerationInterspeech (Interspeech), 2021

363

190

15 Jun 2021

Review of end-to-end speech synthesis technology based on deep learning

236

20 Apr 2021

LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture SearchIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

Xu Tan

Enhong Chen

164

08 Feb 2021

Triple M: A Practical Text-to-speech Synthesis System With Multi-guidance Attention And Multi-band Multi-time LPCNet

358

30 Jan 2021

Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains

Won Jang

D. Lim

Jaesam Yoon

293

19 Nov 2020

Parallel Tacotron: Non-Autoregressive and Controllable TTS

288

109

22 Oct 2020

End-to-End Text-to-Speech using Latent Duration based on VQ-VAEIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020

Yusuke Yasuda

Xin Wang

Junichi Yamagishi

212

19 Oct 2020

Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling

370

114

08 Oct 2020

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

Xu Tan

Zhou Zhao

735

1,710

08 Jun 2020