Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2203.09690
Cited By
v1
v2 (latest)
A
3
^3
3
T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing
International Conference on Machine Learning (ICML), 2022
18 March 2022
Richard He Bai
Renjie Zheng
Junkun Chen
Xintong Li
Mingbo Ma
Liang Huang
Re-assign community
ArXiv (abs)
PDF
HTML
Github (89★)
Papers citing
"A$^3$T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing"
37 / 37 papers shown
Speak, Edit, Repeat: High-Fidelity Voice Editing and Zero-Shot TTS with Cross-Attentive Mamba
Baher Mohammad
Magauiya Zhussip
Stamatios Lefkimmiatis
Mamba
209
1
0
06 Oct 2025
Instance-Specific Test-Time Training for Speech Editing in the Wild
Taewoo Kim
Uijong Lee
H. Park
Choongsang Cho
Nam In Park
Young Han Lee
282
1
0
16 Jun 2025
SpeakStream: Streaming Text-to-Speech with Interleaved Data
Richard He Bai
Zijin Gu
Tatiana Likhomanenko
Navdeep Jaitly
AuLLM
AI4TS
287
5
0
25 May 2025
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
450
6
0
01 May 2025
SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors
Yang Chen
Hui Wang
Shiyao Wang
Jianfei Chen
Jiabei He
Jiaming Zhou
Xi Yang
Longji Xu
Yonghua Lin
Yong Qin
299
7
0
20 Mar 2025
SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
Helin Wang
Meng Yu
Jiarui Hai
Chen Chen
Yuchen Hu
Rilin Chen
Najim Dehak
Dong Yu
513
16
0
03 Jan 2025
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Emmanouil Benetos
Zhikang Niu
Ziyang Ma
Keqi Deng
Chunhui Wang
Jian Zhao
Kai Yu
Xie Chen
822
366
0
09 Oct 2024
MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation
International Conference on Learning Representations (ICLR), 2024
T. Pham
Tri Ton
Chang D. Yoo
390
8
0
03 Oct 2024
DiffEditor: Enhancing Speech Editing with Semantic Enrichment and Acoustic Consistency
Yang Chen
Yuhang Jia
Shiwan Zhao
Ziyue Jiang
Haoran Li
Jiarong Kang
Yong Qin
206
3
0
19 Sep 2024
Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models
Li-Wei Chen
Takuya Higuchi
He Bai
Ahmed Hussen Abdelaziz
Alexander Rudnicky
Shinji Watanabe
Tatiana Likhomanenko
B. Theobald
Zakaria Aldeneh
410
1
0
16 Sep 2024
SongCreator: Lyrics-based Universal Song Generation
Neural Information Processing Systems (NeurIPS), 2024
Shun Lei
Yixuan Zhou
Boshi Tang
Max W. Y. Lam
Feng Liu
Hangyu Liu
Jingcheng Wu
Shiyin Kang
Zhiyong Wu
Helen Meng
365
23
0
09 Sep 2024
Speech Editing -- a Summary
Tobias Kässmann
Yining Liu
Danni Liu
198
2
0
24 Jul 2024
dMel: Speech Tokenization made Simple
Richard He Bai
Tatiana Likhomanenko
Ruixiang Zhang
Zijin Gu
Zakaria Aldeneh
Navdeep Jaitly
398
11
0
22 Jul 2024
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis
Zhijun Liu
Shuai Wang
Sho Inoue
Qibing Bai
Haizhou Li
DiffM
203
41
0
08 Jun 2024
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
Puyuan Peng
Po-Yao (Bernie) Huang
Daniel Li
Abdelrahman Mohamed
David Harwath
561
177
0
25 Mar 2024
AttentionStitch: How Attention Solves the Speech Editing Problem
Antonios Alexos
Pierre Baldi
290
3
0
05 Mar 2024
Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like
Naoyuki Kanda
Xiaofei Wang
Sefik Emre Eskimez
Manthan Thakker
Hemin Yang
...
Yufei Xia
Jinzhu Li
Yanqing Liu
Sheng Zhao
Michael Zeng
412
17
0
12 Feb 2024
Proactive Detection of Voice Cloning with Localized Watermarking
Robin San Roman
Pierre Fernandez
Alexandre Défossez
Teddy Furon
Tuan Tran
Hady ElSahar
560
113
0
30 Jan 2024
FedTabDiff: Federated Learning of Diffusion Probabilistic Models for Synthetic Mixed-Type Tabular Data Generation
Timur Sattarov
Marco Schreyer
Damian Borth
FedML
DiffM
MedIm
251
12
0
11 Jan 2024
Generative Pre-training for Speech with Flow Matching
International Conference on Learning Representations (ICLR), 2023
Alexander H. Liu
Matt Le
Apoorv Vyas
Bowen Shi
Andros Tjandra
Wei-Ning Hsu
448
65
0
25 Oct 2023
FluentEditor: Text-based Speech Editing by Considering Acoustic and Prosody Consistency
Interspeech (Interspeech), 2023
Rui Liu
Jiatian Xi
Ziyue Jiang
Haizhou Li
438
7
0
21 Sep 2023
Cross-Utterance Conditioned VAE for Speech Generation
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Yongqian Li
Cheng Yu
Guangzhi Sun
Weiqin Zu
Zheng Tian
...
Wei Pan
Chao Zhang
Jun Wang
Yang Yang
Fanglei Sun
253
4
0
08 Sep 2023
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Xiaofei Wang
Manthan Thakker
Zhuo Chen
Naoyuki Kanda
Sefik Emre Eskimez
Sanyuan Chen
M. Tang
Shujie Liu
Jinyu Li
Takuya Yoshioka
493
118
0
14 Aug 2023
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Neural Information Processing Systems (NeurIPS), 2023
Matt Le
Apoorv Vyas
Bowen Shi
Brian Karrer
Leda Sari
...
Mary Williamson
Vimal Manohar
Yossi Adi
Jay Mahadeokar
Wei-Ning Hsu
AuLLM
399
478
0
23 Jun 2023
Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation
Zheng Liang
Zheshu Song
Ziyang Ma
Chenpeng Du
K. Yu
Xie Chen
176
6
0
14 Jun 2023
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias
Ziyue Jiang
Yi Ren
Zhe Ye
Jinglin Liu
Chen Zhang
...
Rongjie Huang
Chunfeng Wang
Xiang Yin
Zejun Ma
Zhou Zhao
DiffM
308
99
0
06 Jun 2023
FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Ziyue Jiang
Qiang Yang
Jia-li Zuo
Zhe Ye
Rongjie Huang
Yixiang Ren
Zhou Zhao
DiffM
201
34
0
23 May 2023
CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Zhe Ye
Rongjie Huang
Yi Ren
Ziyue Jiang
Jinglin Liu
Jinzheng He
Xiang Yin
Zhou Zhao
CLIP
183
29
0
18 May 2023
DiffVoice: Text-to-Speech with Latent Diffusion
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Zhijun Liu
Yiwei Guo
K. Yu
DiffM
229
27
0
23 Apr 2023
A Comparative Analysis Of Latent Regressor Losses For Singing Voice Conversion
Brendan O'Connor
S. Dixon
158
0
0
27 Feb 2023
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
IEEE Transactions on Audio, Speech, and Language Processing (IEEE TASLP), 2023
Chengyi Wang
Sanyuan Chen
Yu-Huan Wu
Zi-Hua Zhang
Long Zhou
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
628
1,138
0
05 Jan 2023
ALCAP: Alignment-Augmented Music Captioner
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Zihao He
Weituo Hao
Weiyi Lu
Changyou Chen
Kristina Lerman
Xuchen Song
250
1
0
21 Dec 2022
Emotion Selectable End-to-End Text-based Speech Editing
Artificial Intelligence (AI), 2022
Tao Wang
Jiangyan Yi
Ruibo Fu
Jianhua Tao
Zhengqi Wen
Chu Yuan Zhang
233
5
0
20 Dec 2022
MaskedSpeech: Context-aware Speech Synthesis with Masking Strategy
Interspeech (Interspeech), 2022
Ya-Jie Zhang
Wei Song
Ya Yue
Zhengchen Zhang
Youzheng Wu
Xiaodong He
197
8
0
11 Nov 2022
ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech
Xiaoran Fan
Chao Pang
Tian Yuan
Richard He Bai
Renjie Zheng
...
Junkun Chen
Zeyu Chen
Liang Huang
Yu Sun
Hua Wu
343
1
0
07 Nov 2022
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Takaaki Saeki
Heiga Zen
Zhehuai Chen
Nobuyuki Morioka
Gary Wang
Yu Zhang
Ankur Bapna
Andrew Rosenberg
Bhuvana Ramabhadran
329
25
0
27 Oct 2022
Building Machine Translation Systems for the Next Thousand Languages
Ankur Bapna
Isaac Caswell
Julia Kreutzer
Orhan Firat
D. Esch
...
Apurva Shah
Yanping Huang
Zhiwen Chen
Yonghui Wu
Macduff Hughes
369
117
0
09 May 2022
1
Page 1 of 1