AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020

4 March 2020

Papers citing "AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment"

32 / 32 papers shown

AVFakeBench: A Comprehensive Audio-Video Forgery Detection Benchmark for AV-LMMs

297

26 Nov 2025

Eliminating stability hallucinations in llm-based tts models via attention guidance

218

24 Sep 2025

Marco-Voice Technical Report

...

263

04 Aug 2025

MORE-3S:Multimodal-based Offline Reinforcement Learning with Shared Semantic Spaces

Ge Zhang

285

20 Feb 2024

Cross-Utterance Conditioned VAE for Speech GenerationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

Guangzhi Sun

...

Wei Pan

251

08 Sep 2023

SAR: Self-Supervised Anti-Distortion Representation for End-To-End Speech ModelIEEE International Joint Conference on Neural Network (IJCNN), 2023

340

23 Apr 2023

Semi-Supervised Learning Based on Reference Model for Low-resource TTSInternational Conference on Mobile Ad-hoc and Sensor Networks (MSN), 2022

302

25 Oct 2022

Expressive, Variable, and Controllable Duration Modelling in TTSInterspeech (Interspeech), 2022

207

28 Jun 2022

TDASS: Target Domain Adaptation Speech Synthesis Framework for Multi-speaker Low-Resource TTSIEEE International Joint Conference on Neural Network (IJCNN), 2022

226

24 May 2022

Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-SpeechAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Guangzhi Sun

Jun Wang

200

09 May 2022

Regotron: Regularizing the Tacotron2 architecture via monotonic alignment lossSpoken Language Technology Workshop (SLT), 2022

Efthymios Georgiou

Kosmas Kritsis

Georgios Paraskevopoulos

Athanasios Katsamanis

Vassilis Katsouros

Alexandros Potamianos

380

28 Apr 2022

AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable Duration ModelingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Bac Nguyen

Fabien Cardinaux

Stefan Uhlich

182

21 Mar 2022

DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

378

28 Jan 2022

PAMA-TTS: Progression-Aware Monotonic Attention for Stable Seq2Seq TTS With Accurate Phoneme Duration ControlIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

Yunchao He

Jian Luan

Yujun Wang

353

09 Oct 2021

Neural HMMs are all you need (for high-quality attention-free TTS)IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

455

30 Aug 2021

Federated Learning with Dynamic Transformer for Text to SpeechInterspeech (Interspeech), 2021

148

09 Jul 2021

Multi-Scale Spectrogram Modelling for Neural Text-to-SpeechSpeech Synthesis Workshop (SS), 2021

200

29 Jun 2021

A Survey on Neural Speech Synthesis

Xu Tan

453

446

29 Jun 2021

UniTTS: Residual Learning of Unified Embedding Space for Speech Style Control

M. Kang

Sungjae Kim

Injung Kim

365

21 Jun 2021

Sprachsynthese -- State-of-the-Art in englischer und deutscher Sprache

René Peinl

173

11 Jun 2021

Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-SpeechInternational Conference on Machine Learning (ICML), 2021

384

1,242

11 Jun 2021

SpeechNet: A Universal Modularized Model for Speech Processing Tasks

358

07 May 2021

Review of end-to-end speech synthesis technology based on deep learning

234

20 Apr 2021

Fast DCTTS: Efficient Deep Convolutional Text-to-SpeechIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

196

01 Apr 2021

MelGlow: Efficient Waveform Generative Network Based on Location-Variable ConvolutionSpoken Language Technology Workshop (SLT), 2020

283

03 Dec 2020

Parallel Tacotron: Non-Autoregressive and Controllable TTS

286

109

22 Oct 2020

End-to-End Text-to-Speech using Latent Duration based on VQ-VAEIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020

Yusuke Yasuda

Xin Wang

Junichi Yamagishi

210

19 Oct 2020

Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling

368

114

08 Oct 2020

FastPitch: Parallel Text-to-speech with Pitch PredictionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020

Adrian Lañcucki

361

404

11 Jun 2020

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

Xu Tan

Zhou Zhao

728

1,710

08 Jun 2020

Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search

385

602

22 May 2020

JDI-T: Jointly trained Duration Informed Transformer for Text-To-Speech without Explicit Alignment

248

15 May 2020