Parallel Tacotron: Non-Autoregressive and Controllable TTS

22 October 2020

Papers citing "Parallel Tacotron: Non-Autoregressive and Controllable TTS"

50 / 64 papers shown

KALL-E:Autoregressive Speech Synthesis with Next-Distribution Prediction

485

22 Dec 2024

Zero-shot Cross-lingual Voice Transfer for TTS

Bhuvana Ramabhadran

235

20 Sep 2024

VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech ProcessingIEEE Transactions on Audio, Speech, and Language Processing (IEEE TASLP), 2024

...

436

11 Aug 2024

TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers

Yakun Song

Zhuo Chen

Xiaofei Wang

Ziyang Ma

Guanrou Yang

Xie Chen

AuLLM

153

22 Jun 2024

ASTRA: Aligning Speech and Text Representations for Asr without Sampling

391

10 Jun 2024

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Yuancheng Wang

Xu Tan

...

Jiang Bian

565

325

05 Mar 2024

Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data

...

Andrew Rosenberg

Bhuvana Ramabhadran

Heiga Zen

Francoise Beaufays

Hadar Shemtov

383

29 Feb 2024

Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis

Kenichi Fujita

Atsushi Ando

Yusuke Ijima

102

11 Feb 2024

E3 TTS: Easy End-to-End Diffusion-based Text to SpeechAutomatic Speech Recognition & Understanding (ASRU), 2023

361

02 Nov 2023

Prosody Analysis of AudiobooksInternational Computer Science Conference (ICSC), 2023

Charuta Pethe

Yunting Yin

Felix D Childress

Yunting Yin

Steven Skiena

368

10 Oct 2023

High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Hao Li

Longbiao Wang

310

27 Sep 2023

Let There Be Sound: Reconstructing High Quality Speech from Silent VideosAAAI Conference on Artificial Intelligence (AAAI), 2023

Ji-Hoon Kim

Jaehun Kim

Joon Son Chung

371

29 Aug 2023

Using Text Injection to Improve Recognition of Personal Identifiers in SpeechInterspeech (Interspeech), 2023

Andrew Rosenberg

Zhehuai Chen

Zorik Gekhman

Genady Beryozkin

Parisa Haghani

Bhuvana Ramabhadran

165

14 Aug 2023

Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speechInterspeech (Interspeech), 2023

...

Roberto Barra-Chicote

Daniel Korzekwa

Jaime Lorenzo-Trueba

DiffM

256

31 Jul 2023

Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic CodingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Hao Li

Tao Wang

Longbiao Wang

Jianwu Dang

DiffM

281

28 Jul 2023

GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-SpeechInterspeech (Interspeech), 2023

Xiang Yin

151

27 Jun 2023

LibriTTS-R: A Restored Multi-Speaker Text-to-Speech CorpusInterspeech (Interspeech), 2023

274

161

30 May 2023

CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-trainingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Rongjie Huang

Xiang Yin

Zhou Zhao

CLIP

183

18 May 2023

Improving Prosody for Cross-Speaker Style Transfer by Semi-Supervised Style Extractor and Hierarchical Modeling in Speech SynthesisIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

253

14 Mar 2023

An End-to-End Neural Network for Image-to-Audio TransformationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Liu Chen

Michael Deisher

Munir Georges

193

10 Mar 2023

Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face VideoAAAI Conference on Artificial Intelligence (AAAI), 2023

Minsu Kim

Chae Won Kim

Y. Ro

CVBM DiffM

204

27 Feb 2023

On granularity of prosodic representations in expressive text-to-speechSpoken Language Technology Workshop (SLT), 2023

188

26 Jan 2023

Singing Voice Synthesis Based on a Musical Note Position-Aware Attention MechanismIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

244

28 Dec 2022

Style-Label-Free: Cross-Speaker Style Transfer by Quantized VAE and Speaker-wise Normalization in Speech SynthesisInternational Symposium on Chinese Spoken Language Processing (ISCSLP), 2022

231

13 Dec 2022

Direct Speech-to-speech Translation without Textual Annotation using Bottleneck Features

Junhui Zhang

Junjie Pan

Xiang Yin

Zejun Ma

170

12 Dec 2022

Back-Translation-Style Data Augmentation for Mandarin Chinese Polyphone DisambiguationAsia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2022

162

17 Nov 2022

Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis

Aimilios Chalamandaris

Pirros Tsiakoulis

262

01 Nov 2022

The Importance of Accurate Alignments in End-to-End Speech SynthesisAutomatic Speech Recognition & Understanding (ASRU), 2022

Anusha Prakash

H. Murthy

147

31 Oct 2022

Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech SynthesisIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

305

28 Oct 2022

Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-SpeechIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Zhehuai Chen

Andrew Rosenberg

Bhuvana Ramabhadran

327

27 Oct 2022

Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASRSpoken Language Technology Workshop (SLT), 2022

Zhehuai Chen

Andrew Rosenberg

Bhuvana Ramabhadran

290

18 Oct 2022

WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point IterationSpoken Language Technology Workshop (SLT), 2022

255

03 Oct 2022

Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech

Hang Zhao

Yuxuan Wang

213

13 Jul 2022

SATTS: Speaker Attractor Text to Speech, Learning to Speak by Learning to SeparateInterspeech (Interspeech), 2022

Nabarun Goswami

Tatsuya Harada

208

13 Jul 2022

DeepGraviLens: a Multi-Modal Architecture for Classifying Gravitational Lensing Data

Nicolò Oreste Pinciroli Vago

Piero Fraternali

404

02 May 2022

Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-SpeechInterspeech (Interspeech), 2022

349

08 Apr 2022

MAESTRO: Matched Speech Text Representations through Modality MatchingInterspeech (Interspeech), 2022

Zhehuai Chen

Yu Zhang

Andrew Rosenberg

Bhuvana Ramabhadran

Pedro J. Moreno

Ankur Bapna

Heiga Zen

301

120

07 Apr 2022

Adversarial Learning of Intermediate Acoustic Feature for End-to-End Lightweight Text-to-SpeechInterspeech (Interspeech), 2022

225

05 Apr 2022

Universal Adaptor: Converting Mel-Spectrograms Between Different Configurations for Speech Synthesis

335

01 Apr 2022

AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable Duration ModelingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Bac Nguyen

Fabien Cardinaux

Stefan Uhlich

184

21 Mar 2022

A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTSInterspeech (Interspeech), 2022

928

02 Mar 2022

A Review on Methods and Applications in Multimodal Deep Learning

317

175

18 Feb 2022

DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

390

28 Jan 2022

Neural Grapheme-to-Phoneme Conversion with Pre-trained Grapheme ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

220

26 Jan 2022

More than Words: In-the-Wild Visually-Driven Prosody for Text-to-SpeechComputer Vision and Pattern Recognition (CVPR), 2021

Michael Hassid

Michelle Tadmor Ramanovich

252

19 Nov 2021

VRAIN-UPV MLLP's system for the Blizzard Challenge 2021

A. P. D. Martos

Albert Sanchis

Alfons Juan-Císcar

299

29 Oct 2021

DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021

306

25 Oct 2021

PAMA-TTS: Progression-Aware Monotonic Attention for Stable Seq2Seq TTS With Accurate Phoneme Duration ControlIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

Yunchao He

Jian Luan

Yujun Wang

362

09 Oct 2021

Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech

Xiang Yin

197

08 Oct 2021

A study on the efficacy of model pre-training in developing neural text-to-speech systemIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021

Xu Tan

161

08 Oct 2021