All Papers

0 / 0 papers shown

Title

Title
Speak, Edit, Repeat: High-Fidelity Voice Editing and Zero-Shot TTS with Cross-Attentive Mamba Baher Mohammad Magauiya Zhussip Stamatios Lefkimmiatis Mamba 124 0 0 06 Oct 2025
Instance-Specific Test-Time Training for Speech Editing in the Wild Taewoo Kim Uijong Lee H. Park Choongsang Cho Nam In Park Young Han Lee 182 0 0 16 Jun 2025
PartialEdit: Identifying Partial Deepfakes in the Era of Neural Speech Editing You Zhang Baotong Tian Lin Zhang Z. Duan 127 3 0 03 Jun 2025
SeamlessEdit: Background Noise Aware Zero-Shot Speech Editing with in-Context Enhancement Kuan-Yu Chen Jeng-Lin Li Jian-Jiun Ding 252 0 0 20 May 2025
SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors Yang Chen Hui Wang Shiyao Wang Jianfei Chen Jiabei He Jiaming Zhou Xi Yang Longji Xu Yonghua Lin Yong Qin 183 3 0 20 Mar 2025
SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis Helin Wang Meng Yu Jiarui Hai Chen Chen Yuchen Hu Rilin Chen Najim Dehak Dong Yu 328 10 0 03 Jan 2025
DiffEditor: Enhancing Speech Editing with Semantic Enrichment and Acoustic Consistency Yang Chen Yuhang Jia Shiwan Zhao Ziyue Jiang Haoran Li Jiarong Kang Yong Qin 113 3 0 19 Sep 2024
SongCreator: Lyrics-based Universal Song GenerationNeural Information Processing Systems (NeurIPS), 2024 Shun Lei Yixuan Zhou Boshi Tang Max W. Y. Lam Feng Liu Hangyu Liu Jingcheng Wu Shiyin Kang Zhiyong Wu Helen Meng 253 16 0 09 Sep 2024
Automatic Voice Identification after Speech Resynthesis using PPGThe Speaker and Language Recognition Workshop (Odyssey), 2024 Thibault Gaudier Marie Tahon Anthony Larcher Yannick Esteve 166 0 0 05 Aug 2024
Speech Editing -- a Summary Tobias Kässmann Yining Liu Danni Liu 145 1 0 24 Jul 2024
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis Zhijun Liu Shuai Wang Sho Inoue Qibing Bai Haizhou Li DiffM 168 31 0 08 Jun 2024
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild Puyuan Peng Po-Yao (Bernie) Huang Daniel Li Abdelrahman Mohamed David Harwath 399 144 0 25 Mar 2024
AttentionStitch: How Attention Solves the Speech Editing Problem Antonios Alexos Pierre Baldi 198 3 0 05 Mar 2024
Fine-Grained Quantitative Emotion Editing for Speech Generation Sho Inoue Kun Zhou Shuai Wang Haizhou Li 206 5 0 04 Mar 2024
uSee: Unified Speech Enhancement and Editing with Conditional Diffusion ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023 Muqiao Yang Chunlei Zhang Yong-mei Xu Zhongweiyang Xu Heming Wang Bhiksha Raj Dong Yu DiffM 155 11 0 02 Oct 2023
FluentEditor: Text-based Speech Editing by Considering Acoustic and Prosody ConsistencyInterspeech (Interspeech), 2023 Rui Liu Jiatian Xi Ziyue Jiang Haizhou Li 341 7 0 21 Sep 2023
Cross-Utterance Conditioned VAE for Speech GenerationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023 Yongqian Li Cheng Yu Guangzhi Sun Weiqin Zu Zheng Tian ... Wei Pan Chao Zhang Jun Wang Yang Yang Fanglei Sun 168 3 0 08 Sep 2023
Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation Zheng Liang Zheshu Song Ziyang Ma Chenpeng Du K. Yu Xie Chen 128 5 0 14 Jun 2023
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias Ziyue Jiang Yi Ren Zhe Ye Jinglin Liu Chen Zhang ... Rongjie Huang Chunfeng Wang Xiang Yin Zejun Ma Zhou Zhao DiffM 216 95 0 06 Jun 2023
FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023 Ziyue Jiang Qiang Yang Jia-li Zuo Zhe Ye Rongjie Huang Yixiang Ren Zhou Zhao DiffM 153 28 0 23 May 2023
DiffVoice: Text-to-Speech with Latent DiffusionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023 Zhijun Liu Yiwei Guo K. Yu DiffM 163 27 0 23 Apr 2023
Emotion Selectable End-to-End Text-based Speech EditingArtificial Intelligence (AI), 2022 Tao Wang Jiangyan Yi Ruibo Fu Jianhua Tao Zhengqi Wen Chu Yuan Zhang 161 4 0 20 Dec 2022
MaskedSpeech: Context-aware Speech Synthesis with Masking StrategyInterspeech (Interspeech), 2022 Ya-Jie Zhang Wei Song Ya Yue Zhengchen Zhang Youzheng Wu Xiaodong He 140 7 0 11 Nov 2022
ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech Xiaoran Fan Chao Pang Tian Yuan Richard He Bai Renjie Zheng ... Junkun Chen Zeyu Chen Liang Huang Yu Sun Hua Wu 238 1 0 07 Nov 2022
Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders Jason Fong Yun Wang Prabhav Agrawal Vimal Manohar Jilong Wu Thilo Kohler Qing He 134 0 0 28 Oct 2022
The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an UtteranceIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022 Lin Zhang Xin Wang Erica Cooper Nicholas W. D. Evans Junichi Yamagishi 288 86 0 11 Apr 2022
A $^3$ T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and EditingInternational Conference on Machine Learning (ICML), 2022 Richard He Bai Renjie Zheng Junkun Chen Xintong Li Mingbo Ma Liang Huang 209 60 0 18 Mar 2022
CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech EditingIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022 Tao Wang Jiangyan Yi Ruibo Fu Jianhua Tao Zhengqi Wen KELM 123 25 0 21 Feb 2022
SpeechPainter: Text-conditioned Speech InpaintingInterspeech (Interspeech), 2022 Zalan Borsos Matthew Sharifi Marco Tagliasacchi 174 35 0 15 Feb 2022
Environment Aware Text-to-Speech SynthesisInterspeech (Interspeech), 2021 Daxin Tan Guangyan Zhang Tan Lee 197 8 0 08 Oct 2021
EdiTTS: Score-based Editing for Controllable Text-to-Speech Jaesung Tae Hyeongju Kim Taesu Kim DiffM 383 47 0 06 Oct 2021

Title

Speak, Edit, Repeat: High-Fidelity Voice Editing and Zero-Shot TTS with Cross-Attentive Mamba

Baher Mohammad

Magauiya Zhussip

Stamatios Lefkimmiatis

Mamba

124

06 Oct 2025

Instance-Specific Test-Time Training for Speech Editing in the Wild

182

16 Jun 2025

PartialEdit: Identifying Partial Deepfakes in the Era of Neural Speech Editing

127

03 Jun 2025

SeamlessEdit: Background Noise Aware Zero-Shot Speech Editing with in-Context Enhancement

Kuan-Yu Chen

Jeng-Lin Li

Jian-Jiun Ding

252

20 May 2025

SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors

183

20 Mar 2025

SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis

328

03 Jan 2025

DiffEditor: Enhancing Speech Editing with Semantic Enrichment and Acoustic Consistency

Yang Chen

Yuhang Jia

Shiwan Zhao

Ziyue Jiang

Haoran Li

Jiarong Kang

Yong Qin

113

19 Sep 2024

SongCreator: Lyrics-based Universal Song GenerationNeural Information Processing Systems (NeurIPS), 2024

Shun Lei

Zhiyong Wu

Helen Meng

253

09 Sep 2024

Automatic Voice Identification after Speech Resynthesis using PPGThe Speaker and Language Recognition Workshop (Odyssey), 2024

166

05 Aug 2024

Speech Editing -- a Summary

Tobias Kässmann

Yining Liu

Danni Liu

145

24 Jul 2024

Autoregressive Diffusion Transformer for Text-to-Speech Synthesis

Zhijun Liu

Haizhou Li

168

08 Jun 2024

VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild

Puyuan Peng

Po-Yao (Bernie) Huang

Daniel Li

Abdelrahman Mohamed

David Harwath

399

144

25 Mar 2024

AttentionStitch: How Attention Solves the Speech Editing Problem

Antonios Alexos

Pierre Baldi

198

05 Mar 2024

Fine-Grained Quantitative Emotion Editing for Speech Generation

Sho Inoue

Kun Zhou

Shuai Wang

Haizhou Li

206

04 Mar 2024

uSee: Unified Speech Enhancement and Editing with Conditional Diffusion ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Bhiksha Raj

Dong Yu

DiffM

155

02 Oct 2023

FluentEditor: Text-based Speech Editing by Considering Acoustic and Prosody ConsistencyInterspeech (Interspeech), 2023

341

21 Sep 2023

Cross-Utterance Conditioned VAE for Speech GenerationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

Guangzhi Sun

...

Wei Pan

168

08 Sep 2023

Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation

Chenpeng Du

Xie Chen

128

14 Jun 2023

Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias

...

Rongjie Huang

Chunfeng Wang

Xiang Yin

Zejun Ma

Zhou Zhao

DiffM

216

06 Jun 2023

FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Rongjie Huang

Zhou Zhao

153

23 May 2023

DiffVoice: Text-to-Speech with Latent DiffusionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Zhijun Liu

Yiwei Guo

K. Yu

DiffM

163

23 Apr 2023

Emotion Selectable End-to-End Text-based Speech EditingArtificial Intelligence (AI), 2022

Tao Wang

Jiangyan Yi

161

20 Dec 2022

MaskedSpeech: Context-aware Speech Synthesis with Masking StrategyInterspeech (Interspeech), 2022

140

11 Nov 2022

ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech

...

238

07 Nov 2022

Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders

134

28 Oct 2022

The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an UtteranceIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022

Xin Wang

288

11 Apr 2022

A$^3$T: Alignment-Aware Acoustic and Text Pretraining for Speech
Synthesis and Editing

^3

T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and EditingInternational Conference on Machine Learning (ICML), 2022

209

18 Mar 2022

CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech EditingIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022

Tao Wang

Jiangyan Yi

123

21 Feb 2022

SpeechPainter: Text-conditioned Speech InpaintingInterspeech (Interspeech), 2022

Zalan Borsos

Matthew Sharifi

Marco Tagliasacchi

174

15 Feb 2022

Environment Aware Text-to-Speech SynthesisInterspeech (Interspeech), 2021

Daxin Tan

Guangyan Zhang

Tan Lee

197

08 Oct 2021

EdiTTS: Score-based Editing for Controllable Text-to-Speech

Jaesung Tae

Hyeongju Kim

Taesu Kim

DiffM

383

06 Oct 2021

Title
Speak, Edit, Repeat: High-Fidelity Voice Editing and Zero-Shot TTS with Cross-Attentive Mamba Baher Mohammad Magauiya Zhussip Stamatios Lefkimmiatis Mamba 124 0 0 06 Oct 2025
Instance-Specific Test-Time Training for Speech Editing in the Wild Taewoo Kim Uijong Lee H. Park Choongsang Cho Nam In Park Young Han Lee 182 0 0 16 Jun 2025
PartialEdit: Identifying Partial Deepfakes in the Era of Neural Speech Editing You Zhang Baotong Tian Lin Zhang Z. Duan 127 3 0 03 Jun 2025
SeamlessEdit: Background Noise Aware Zero-Shot Speech Editing with in-Context Enhancement Kuan-Yu Chen Jeng-Lin Li Jian-Jiun Ding 252 0 0 20 May 2025
SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors Yang Chen Hui Wang Shiyao Wang Jianfei Chen Jiabei He Jiaming Zhou Xi Yang Longji Xu Yonghua Lin Yong Qin 183 3 0 20 Mar 2025
SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis Helin Wang Meng Yu Jiarui Hai Chen Chen Yuchen Hu Rilin Chen Najim Dehak Dong Yu 328 10 0 03 Jan 2025
DiffEditor: Enhancing Speech Editing with Semantic Enrichment and Acoustic Consistency Yang Chen Yuhang Jia Shiwan Zhao Ziyue Jiang Haoran Li Jiarong Kang Yong Qin 113 3 0 19 Sep 2024
SongCreator: Lyrics-based Universal Song GenerationNeural Information Processing Systems (NeurIPS), 2024 Shun Lei Yixuan Zhou Boshi Tang Max W. Y. Lam Feng Liu Hangyu Liu Jingcheng Wu Shiyin Kang Zhiyong Wu Helen Meng 253 16 0 09 Sep 2024
Automatic Voice Identification after Speech Resynthesis using PPGThe Speaker and Language Recognition Workshop (Odyssey), 2024 Thibault Gaudier Marie Tahon Anthony Larcher Yannick Esteve 166 0 0 05 Aug 2024
Speech Editing -- a Summary Tobias Kässmann Yining Liu Danni Liu 145 1 0 24 Jul 2024
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis Zhijun Liu Shuai Wang Sho Inoue Qibing Bai Haizhou Li DiffM 168 31 0 08 Jun 2024
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild Puyuan Peng Po-Yao (Bernie) Huang Daniel Li Abdelrahman Mohamed David Harwath 399 144 0 25 Mar 2024
AttentionStitch: How Attention Solves the Speech Editing Problem Antonios Alexos Pierre Baldi 198 3 0 05 Mar 2024
Fine-Grained Quantitative Emotion Editing for Speech Generation Sho Inoue Kun Zhou Shuai Wang Haizhou Li 206 5 0 04 Mar 2024
uSee: Unified Speech Enhancement and Editing with Conditional Diffusion ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023 Muqiao Yang Chunlei Zhang Yong-mei Xu Zhongweiyang Xu Heming Wang Bhiksha Raj Dong Yu DiffM 155 11 0 02 Oct 2023
FluentEditor: Text-based Speech Editing by Considering Acoustic and Prosody ConsistencyInterspeech (Interspeech), 2023 Rui Liu Jiatian Xi Ziyue Jiang Haizhou Li 341 7 0 21 Sep 2023
Cross-Utterance Conditioned VAE for Speech GenerationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023 Yongqian Li Cheng Yu Guangzhi Sun Weiqin Zu Zheng Tian ... Wei Pan Chao Zhang Jun Wang Yang Yang Fanglei Sun 168 3 0 08 Sep 2023
Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation Zheng Liang Zheshu Song Ziyang Ma Chenpeng Du K. Yu Xie Chen 128 5 0 14 Jun 2023
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias Ziyue Jiang Yi Ren Zhe Ye Jinglin Liu Chen Zhang ... Rongjie Huang Chunfeng Wang Xiang Yin Zejun Ma Zhou Zhao DiffM 216 95 0 06 Jun 2023
FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023 Ziyue Jiang Qiang Yang Jia-li Zuo Zhe Ye Rongjie Huang Yixiang Ren Zhou Zhao DiffM 153 28 0 23 May 2023
DiffVoice: Text-to-Speech with Latent DiffusionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023 Zhijun Liu Yiwei Guo K. Yu DiffM 163 27 0 23 Apr 2023
Emotion Selectable End-to-End Text-based Speech EditingArtificial Intelligence (AI), 2022 Tao Wang Jiangyan Yi Ruibo Fu Jianhua Tao Zhengqi Wen Chu Yuan Zhang 161 4 0 20 Dec 2022
MaskedSpeech: Context-aware Speech Synthesis with Masking StrategyInterspeech (Interspeech), 2022 Ya-Jie Zhang Wei Song Ya Yue Zhengchen Zhang Youzheng Wu Xiaodong He 140 7 0 11 Nov 2022
ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech Xiaoran Fan Chao Pang Tian Yuan Richard He Bai Renjie Zheng ... Junkun Chen Zeyu Chen Liang Huang Yu Sun Hua Wu 238 1 0 07 Nov 2022
Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders Jason Fong Yun Wang Prabhav Agrawal Vimal Manohar Jilong Wu Thilo Kohler Qing He 134 0 0 28 Oct 2022
The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an UtteranceIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022 Lin Zhang Xin Wang Erica Cooper Nicholas W. D. Evans Junichi Yamagishi 288 86 0 11 Apr 2022
A $^3$ T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and EditingInternational Conference on Machine Learning (ICML), 2022 Richard He Bai Renjie Zheng Junkun Chen Xintong Li Mingbo Ma Liang Huang 209 60 0 18 Mar 2022
CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech EditingIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022 Tao Wang Jiangyan Yi Ruibo Fu Jianhua Tao Zhengqi Wen KELM 123 25 0 21 Feb 2022
SpeechPainter: Text-conditioned Speech InpaintingInterspeech (Interspeech), 2022 Zalan Borsos Matthew Sharifi Marco Tagliasacchi 174 35 0 15 Feb 2022
Environment Aware Text-to-Speech SynthesisInterspeech (Interspeech), 2021 Daxin Tan Guangyan Zhang Tan Lee 197 8 0 08 Oct 2021
EdiTTS: Score-based Editing for Controllable Text-to-Speech Jaesung Tae Hyeongju Kim Taesu Kim DiffM 383 47 0 06 Oct 2021

Title

Speak, Edit, Repeat: High-Fidelity Voice Editing and Zero-Shot TTS with Cross-Attentive Mamba

Baher Mohammad

Magauiya Zhussip

Stamatios Lefkimmiatis

Mamba

124

06 Oct 2025

Instance-Specific Test-Time Training for Speech Editing in the Wild

Taewoo Kim

Uijong Lee

H. Park