v1v2 (latest)

A $^3$ T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing

International Conference on Machine Learning (ICML), 2022

18 March 2022

ArXiv (abs)PDF HTML Github (89★)

Papers citing "A$^3$T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing"

37 / 37 papers shown

Speak, Edit, Repeat: High-Fidelity Voice Editing and Zero-Shot TTS with Cross-Attentive Mamba

Baher Mohammad

Magauiya Zhussip

Stamatios Lefkimmiatis

Mamba

209

06 Oct 2025

Instance-Specific Test-Time Training for Speech Editing in the Wild

282

16 Jun 2025

SpeakStream: Streaming Text-to-Speech with Interleaved Data

287

25 May 2025

Voice Cloning: Comprehensive Survey

Hussam Azzuni

Abdulmotaleb El Saddik

VLM

450

01 May 2025

SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors

299

20 Mar 2025

SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis

513

03 Jan 2025

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow MatchingAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

822

366

09 Oct 2024

MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound GenerationInternational Conference on Learning Representations (ICLR), 2024

T. Pham

Tri Ton

Chang D. Yoo

390

03 Oct 2024

DiffEditor: Enhancing Speech Editing with Semantic Enrichment and Acoustic Consistency

Yang Chen

Yuhang Jia

Shiwan Zhao

Ziyue Jiang

Haoran Li

Jiarong Kang

Yong Qin

206

19 Sep 2024

Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models

Li-Wei Chen

Takuya Higuchi

He Bai

Ahmed Hussen Abdelaziz

Alexander Rudnicky

Shinji Watanabe

Tatiana Likhomanenko

B. Theobald

Zakaria Aldeneh

410

16 Sep 2024

SongCreator: Lyrics-based Universal Song GenerationNeural Information Processing Systems (NeurIPS), 2024

Shun Lei

Zhiyong Wu

Helen Meng

365

09 Sep 2024

Speech Editing -- a Summary

Tobias Kässmann

Yining Liu

Danni Liu

198

24 Jul 2024

dMel: Speech Tokenization made Simple

398

22 Jul 2024

Autoregressive Diffusion Transformer for Text-to-Speech Synthesis

Zhijun Liu

Haizhou Li

203

08 Jun 2024

VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild

Puyuan Peng

Po-Yao (Bernie) Huang

Daniel Li

Abdelrahman Mohamed

David Harwath

561

177

25 Mar 2024

AttentionStitch: How Attention Solves the Speech Editing Problem

Antonios Alexos

Pierre Baldi

290

05 Mar 2024

Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like

...

412

12 Feb 2024

Proactive Detection of Voice Cloning with Localized Watermarking

Robin San Roman

Pierre Fernandez

560

113

30 Jan 2024

FedTabDiff: Federated Learning of Diffusion Probabilistic Models for Synthetic Mixed-Type Tabular Data Generation

251

11 Jan 2024

Generative Pre-training for Speech with Flow MatchingInternational Conference on Learning Representations (ICLR), 2023

448

25 Oct 2023

FluentEditor: Text-based Speech Editing by Considering Acoustic and Prosody ConsistencyInterspeech (Interspeech), 2023

438

21 Sep 2023

Cross-Utterance Conditioned VAE for Speech GenerationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

Guangzhi Sun

...

Wei Pan

253

08 Sep 2023

SpeechX: Neural Codec Language Model as a Versatile Speech TransformerIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

493

118

14 Aug 2023

Voicebox: Text-Guided Multilingual Universal Speech Generation at ScaleNeural Information Processing Systems (NeurIPS), 2023

...

Yossi Adi

399

478

23 Jun 2023

Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation

Chenpeng Du

Xie Chen

176

14 Jun 2023

Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias

...

Rongjie Huang

Chunfeng Wang

Xiang Yin

Zejun Ma

Zhou Zhao

DiffM

308

06 Jun 2023

FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Rongjie Huang

Zhou Zhao

201

23 May 2023

CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-trainingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Rongjie Huang

Xiang Yin

Zhou Zhao

CLIP

183

18 May 2023

DiffVoice: Text-to-Speech with Latent DiffusionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Zhijun Liu

Yiwei Guo

K. Yu

DiffM

229

23 Apr 2023

A Comparative Analysis Of Latent Regressor Losses For Singing Voice Conversion

Brendan O'Connor

S. Dixon

158

27 Feb 2023

Neural Codec Language Models are Zero-Shot Text to Speech SynthesizersIEEE Transactions on Audio, Speech, and Language Processing (IEEE TASLP), 2023

...

628

1,138

05 Jan 2023

ALCAP: Alignment-Augmented Music CaptionerConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Weituo Hao

250

21 Dec 2022

Emotion Selectable End-to-End Text-based Speech EditingArtificial Intelligence (AI), 2022

Tao Wang

Jiangyan Yi

233

20 Dec 2022

MaskedSpeech: Context-aware Speech Synthesis with Masking StrategyInterspeech (Interspeech), 2022

197

11 Nov 2022

ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech

...

343

07 Nov 2022

Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-SpeechIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Zhehuai Chen

Andrew Rosenberg

Bhuvana Ramabhadran

329

27 Oct 2022

Building Machine Translation Systems for the Next Thousand Languages

...

369

117

09 May 2022

A3^33T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing

Papers citing "A$^3$T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing"

A $^3$ T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing