ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.09690
  4. Cited By
A$^3$T: Alignment-Aware Acoustic and Text Pretraining for Speech
  Synthesis and Editing
v1v2 (latest)

A3^33T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing

International Conference on Machine Learning (ICML), 2022
18 March 2022
Richard He Bai
Renjie Zheng
Junkun Chen
Xintong Li
Mingbo Ma
Liang Huang
ArXiv (abs)PDFHTMLGithub (89★)

Papers citing "A$^3$T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing"

37 / 37 papers shown
Speak, Edit, Repeat: High-Fidelity Voice Editing and Zero-Shot TTS with Cross-Attentive Mamba
Speak, Edit, Repeat: High-Fidelity Voice Editing and Zero-Shot TTS with Cross-Attentive Mamba
Baher Mohammad
Magauiya Zhussip
Stamatios Lefkimmiatis
Mamba
209
1
0
06 Oct 2025
Instance-Specific Test-Time Training for Speech Editing in the Wild
Instance-Specific Test-Time Training for Speech Editing in the Wild
Taewoo Kim
Uijong Lee
H. Park
Choongsang Cho
Nam In Park
Young Han Lee
282
1
0
16 Jun 2025
SpeakStream: Streaming Text-to-Speech with Interleaved Data
SpeakStream: Streaming Text-to-Speech with Interleaved Data
Richard He Bai
Zijin Gu
Tatiana Likhomanenko
Navdeep Jaitly
AuLLMAI4TS
287
5
0
25 May 2025
Voice Cloning: Comprehensive Survey
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
450
6
0
01 May 2025
SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors
SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors
Yang Chen
Hui Wang
Shiyao Wang
Jianfei Chen
Jiabei He
Jiaming Zhou
Xi Yang
Longji Xu
Yonghua Lin
Yong Qin
299
7
0
20 Mar 2025
SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
Helin Wang
Meng Yu
Jiarui Hai
Chen Chen
Yuchen Hu
Rilin Chen
Najim Dehak
Dong Yu
513
16
0
03 Jan 2025
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow MatchingAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Emmanouil Benetos
Zhikang Niu
Ziyang Ma
Keqi Deng
Chunhui Wang
Jian Zhao
Kai Yu
Xie Chen
822
366
0
09 Oct 2024
MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation
MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound GenerationInternational Conference on Learning Representations (ICLR), 2024
T. Pham
Tri Ton
Chang D. Yoo
390
8
0
03 Oct 2024
DiffEditor: Enhancing Speech Editing with Semantic Enrichment and
  Acoustic Consistency
DiffEditor: Enhancing Speech Editing with Semantic Enrichment and Acoustic Consistency
Yang Chen
Yuhang Jia
Shiwan Zhao
Ziyue Jiang
Haoran Li
Jiarong Kang
Yong Qin
206
3
0
19 Sep 2024
Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models
Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models
Li-Wei Chen
Takuya Higuchi
He Bai
Ahmed Hussen Abdelaziz
Alexander Rudnicky
Shinji Watanabe
Tatiana Likhomanenko
B. Theobald
Zakaria Aldeneh
410
1
0
16 Sep 2024
SongCreator: Lyrics-based Universal Song Generation
SongCreator: Lyrics-based Universal Song GenerationNeural Information Processing Systems (NeurIPS), 2024
Shun Lei
Yixuan Zhou
Boshi Tang
Max W. Y. Lam
Feng Liu
Hangyu Liu
Jingcheng Wu
Shiyin Kang
Zhiyong Wu
Helen Meng
365
23
0
09 Sep 2024
Speech Editing -- a Summary
Speech Editing -- a Summary
Tobias Kässmann
Yining Liu
Danni Liu
198
2
0
24 Jul 2024
dMel: Speech Tokenization made Simple
dMel: Speech Tokenization made Simple
Richard He Bai
Tatiana Likhomanenko
Ruixiang Zhang
Zijin Gu
Zakaria Aldeneh
Navdeep Jaitly
398
11
0
22 Jul 2024
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis
Zhijun Liu
Shuai Wang
Sho Inoue
Qibing Bai
Haizhou Li
DiffM
203
41
0
08 Jun 2024
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
Puyuan Peng
Po-Yao (Bernie) Huang
Daniel Li
Abdelrahman Mohamed
David Harwath
561
177
0
25 Mar 2024
AttentionStitch: How Attention Solves the Speech Editing Problem
AttentionStitch: How Attention Solves the Speech Editing Problem
Antonios Alexos
Pierre Baldi
290
3
0
05 Mar 2024
Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like
Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like
Naoyuki Kanda
Xiaofei Wang
Sefik Emre Eskimez
Manthan Thakker
Hemin Yang
...
Yufei Xia
Jinzhu Li
Yanqing Liu
Sheng Zhao
Michael Zeng
412
17
0
12 Feb 2024
Proactive Detection of Voice Cloning with Localized Watermarking
Proactive Detection of Voice Cloning with Localized Watermarking
Robin San Roman
Pierre Fernandez
Alexandre Défossez
Teddy Furon
Tuan Tran
Hady ElSahar
560
113
0
30 Jan 2024
FedTabDiff: Federated Learning of Diffusion Probabilistic Models for
  Synthetic Mixed-Type Tabular Data Generation
FedTabDiff: Federated Learning of Diffusion Probabilistic Models for Synthetic Mixed-Type Tabular Data Generation
Timur Sattarov
Marco Schreyer
Damian Borth
FedMLDiffMMedIm
251
12
0
11 Jan 2024
Generative Pre-training for Speech with Flow Matching
Generative Pre-training for Speech with Flow MatchingInternational Conference on Learning Representations (ICLR), 2023
Alexander H. Liu
Matt Le
Apoorv Vyas
Bowen Shi
Andros Tjandra
Wei-Ning Hsu
448
65
0
25 Oct 2023
FluentEditor: Text-based Speech Editing by Considering Acoustic and
  Prosody Consistency
FluentEditor: Text-based Speech Editing by Considering Acoustic and Prosody ConsistencyInterspeech (Interspeech), 2023
Rui Liu
Jiatian Xi
Ziyue Jiang
Haizhou Li
438
7
0
21 Sep 2023
Cross-Utterance Conditioned VAE for Speech Generation
Cross-Utterance Conditioned VAE for Speech GenerationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Yongqian Li
Cheng Yu
Guangzhi Sun
Weiqin Zu
Zheng Tian
...
Wei Pan
Chao Zhang
Jun Wang
Yang Yang
Fanglei Sun
253
4
0
08 Sep 2023
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
SpeechX: Neural Codec Language Model as a Versatile Speech TransformerIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Xiaofei Wang
Manthan Thakker
Zhuo Chen
Naoyuki Kanda
Sefik Emre Eskimez
Sanyuan Chen
M. Tang
Shujie Liu
Jinyu Li
Takuya Yoshioka
493
118
0
14 Aug 2023
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Voicebox: Text-Guided Multilingual Universal Speech Generation at ScaleNeural Information Processing Systems (NeurIPS), 2023
Matt Le
Apoorv Vyas
Bowen Shi
Brian Karrer
Leda Sari
...
Mary Williamson
Vimal Manohar
Yossi Adi
Jay Mahadeokar
Wei-Ning Hsu
AuLLM
399
478
0
23 Jun 2023
Improving Code-Switching and Named Entity Recognition in ASR with Speech
  Editing based Data Augmentation
Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation
Zheng Liang
Zheshu Song
Ziyang Ma
Chenpeng Du
K. Yu
Xie Chen
176
6
0
14 Jun 2023
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive
  Bias
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias
Ziyue Jiang
Yi Ren
Zhe Ye
Jinglin Liu
Chen Zhang
...
Rongjie Huang
Chunfeng Wang
Xiang Yin
Zejun Ma
Zhou Zhao
DiffM
308
99
0
06 Jun 2023
FluentSpeech: Stutter-Oriented Automatic Speech Editing with
  Context-Aware Diffusion Models
FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Ziyue Jiang
Qiang Yang
Jia-li Zuo
Zhe Ye
Rongjie Huang
Yixiang Ren
Zhou Zhao
DiffM
201
34
0
23 May 2023
CLAPSpeech: Learning Prosody from Text Context with Contrastive
  Language-Audio Pre-training
CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-trainingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Zhe Ye
Rongjie Huang
Yi Ren
Ziyue Jiang
Jinglin Liu
Jinzheng He
Xiang Yin
Zhou Zhao
CLIP
183
29
0
18 May 2023
DiffVoice: Text-to-Speech with Latent Diffusion
DiffVoice: Text-to-Speech with Latent DiffusionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Zhijun Liu
Yiwei Guo
K. Yu
DiffM
229
27
0
23 Apr 2023
A Comparative Analysis Of Latent Regressor Losses For Singing Voice
  Conversion
A Comparative Analysis Of Latent Regressor Losses For Singing Voice Conversion
Brendan O'Connor
S. Dixon
158
0
0
27 Feb 2023
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Neural Codec Language Models are Zero-Shot Text to Speech SynthesizersIEEE Transactions on Audio, Speech, and Language Processing (IEEE TASLP), 2023
Chengyi Wang
Sanyuan Chen
Yu-Huan Wu
Zi-Hua Zhang
Long Zhou
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
628
1,138
0
05 Jan 2023
ALCAP: Alignment-Augmented Music Captioner
ALCAP: Alignment-Augmented Music CaptionerConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Zihao He
Weituo Hao
Weiyi Lu
Changyou Chen
Kristina Lerman
Xuchen Song
250
1
0
21 Dec 2022
Emotion Selectable End-to-End Text-based Speech Editing
Emotion Selectable End-to-End Text-based Speech EditingArtificial Intelligence (AI), 2022
Tao Wang
Jiangyan Yi
Ruibo Fu
Jianhua Tao
Zhengqi Wen
Chu Yuan Zhang
233
5
0
20 Dec 2022
MaskedSpeech: Context-aware Speech Synthesis with Masking Strategy
MaskedSpeech: Context-aware Speech Synthesis with Masking StrategyInterspeech (Interspeech), 2022
Ya-Jie Zhang
Wei Song
Ya Yue
Zhengchen Zhang
Youzheng Wu
Xiaodong He
197
8
0
11 Nov 2022
ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual
  Multi-Speaker Text-to-Speech
ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech
Xiaoran Fan
Chao Pang
Tian Yuan
Richard He Bai
Renjie Zheng
...
Junkun Chen
Zeyu Chen
Liang Huang
Yu Sun
Hua Wu
343
1
0
07 Nov 2022
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised
  Learning for Text-To-Speech
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-SpeechIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Takaaki Saeki
Heiga Zen
Zhehuai Chen
Nobuyuki Morioka
Gary Wang
Yu Zhang
Ankur Bapna
Andrew Rosenberg
Bhuvana Ramabhadran
329
25
0
27 Oct 2022
Building Machine Translation Systems for the Next Thousand Languages
Building Machine Translation Systems for the Next Thousand Languages
Ankur Bapna
Isaac Caswell
Julia Kreutzer
Orhan Firat
D. Esch
...
Apurva Shah
Yanping Huang
Zhiwen Chen
Yonghui Wu
Macduff Hughes
369
117
0
09 May 2022
1
Page 1 of 1