Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2202.13066
Cited By
Revisiting Over-Smoothness in Text to Speech
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
26 February 2022
Yi Ren
Xu Tan
Tao Qin
Zhou Zhao
Tie-Yan Liu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Revisiting Over-Smoothness in Text to Speech"
37 / 37 papers shown
Title
Rethinking Long-tailed Dataset Distillation: A Uni-Level Framework with Unbiased Recovery and Relabeling
Xiao Cui
Yulei Qin
Xinyue Li
Wengang Zhou
Hongsheng Li
Houqiang Li
DD
FedML
241
0
0
24 Nov 2025
ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
Han Zhu
Wei Kang
Zengwei Yao
Liyong Guo
Fangjun Kuang
Zhaoqing Li
Weiji Zhuang
Long Lin
Daniel Povey
243
8
0
16 Jun 2025
Instance-Specific Test-Time Training for Speech Editing in the Wild
Taewoo Kim
Uijong Lee
H. Park
Choongsang Cho
Nam In Park
Young Han Lee
146
0
0
16 Jun 2025
BemaGANv2: A Tutorial and Comparative Survey of GAN-based Vocoders for Long-Term Audio Generation
Taesoo Park
Mungwi Jeong
Mingyu Park
Narae Kim
Junyoung Kim
Mujung Kim
Jisang Yoo
Hoyun Lee
Sanghoon Kim
Soonchul Kwon
117
0
0
11 Jun 2025
NaturalL2S: End-to-End High-quality Multispeaker Lip-to-Speech Synthesis with Differential Digital Signal Processing
Neural Networks (NN), 2025
Yifan Liang
Fangkun Liu
Andong Li
Xiaodong Li
C. Zheng
252
2
0
17 Feb 2025
FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching
Hui Wang
Shujie Liu
Lingwei Meng
Jiajian Li
Yifan Yang
...
Yanqing Liu
Haoqin Sun
Jiaming Zhou
Yan Lu
Yong Qin
238
11
0
16 Feb 2025
KALL-E:Autoregressive Speech Synthesis with Next-Distribution Prediction
Kangxiang Xia
Xinfa Zhu
Lei Xie
WenJie Tian
W. Li
Lei Xie
VLM
360
0
0
22 Dec 2024
Lina-Speech: Gated Linear Attention and Initial-State Tuning for Multi-Sample Prompting Text-To-Speech Synthesis
Théodor Lemerle
Harrison Vanderbyl
Vaibhav Srivastav
Nicolas Obin
132
4
0
30 Oct 2024
Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models
Sijing Chen
Qi Liu
Laipeng He
Tianwei He
Wendi He
...
Huimin Zhang
Xiang Zhang
Guangcheng Zhao
Hongbin Zhou
Pengpeng Zou
212
12
0
18 Sep 2024
Acquiring Pronunciation Knowledge from Transcribed Speech Audio via Multi-task Learning
Siqi Sun
Korin Richmond
254
0
0
15 Sep 2024
Improving Robustness of Diffusion-Based Zero-Shot Speech Synthesis via Stable Formant Generation
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
C. Han
Seokgi Lee
Gyuhyeon Nam
Gyeongsu Chae
DiffM
951
0
0
14 Sep 2024
Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training
Hawraz A. Ahmad
Tarik A. Rashid
185
1
0
06 Aug 2024
Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis
Shivam Mehta
Anna Deichler
Jim O'Regan
Birger Moëll
Jonas Beskow
G. Henter
Simon Alexanderson
200
7
0
30 Apr 2024
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Zhen Ye
Zeqian Ju
Haohe Liu
Xu Tan
Jianyi Chen
...
Weizhen Bian
Shulin He
Qi-fei Liu
Yi-Ting Guo
Wei Xue
230
30
0
23 Apr 2024
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
Mateusz Lajszczak
Guillermo Cámbara
Yang Li
Fatih Beyhan
Arent van Korlaar
...
Bartosz Putrycz
Soledad López Gambino
Kayeon Yoo
Elena Sokolova
Thomas Drugman
LM&MA
298
109
0
12 Feb 2024
StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Zhichao Wang
Yuan-Jui Chen
Xinsheng Wang
Lei Xie
Yuping Wang
277
12
0
19 Jan 2024
FluentEditor: Text-based Speech Editing by Considering Acoustic and Prosody Consistency
Interspeech (Interspeech), 2023
Rui Liu
Jiatian Xi
Ziyue Jiang
Haizhou Li
285
7
0
21 Sep 2023
Let There Be Sound: Reconstructing High Quality Speech from Silent Videos
AAAI Conference on Artificial Intelligence (AAAI), 2023
Ji-Hoon Kim
Jaehun Kim
Joon Son Chung
219
10
0
29 Aug 2023
DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial Training
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
H. Oh
Sang-Hoon Lee
Seong-Whan Lee
DiffM
234
26
0
31 Jul 2023
SCRAPS: Speech Contrastive Representations of Acoustic and Phonetic Spaces
European Conference on Artificial Intelligence (ECAI), 2023
Iván Vallés-Pérez
Grzegorz Beringer
Piotr Bilinski
G. Cook
Roberto Barra-Chicote
122
1
0
23 Jul 2023
PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and Pause-based Prosody Modeling
Asian Conference on Pattern Recognition (ACPR), 2023
Ji-Sang Hwang
Sang-Hoon Lee
Seong-Whan Lee
150
4
0
13 Jun 2023
Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge
Interspeech (Interspeech), 2023
Wenhao Guan
Tao Li
Yishuang Li
Hukai Huang
Q. Hong
Lin Li
DiffM
139
6
0
07 Jun 2023
Towards Robust FastSpeech 2 by Modelling Residual Multimodality
Interspeech (Interspeech), 2023
Fabian Kögel
Bac Nguyen
Fabien Cardinaux
116
3
0
02 Jun 2023
Stochastic Pitch Prediction Improves the Diversity and Naturalness of Speech in Glow-TTS
Interspeech (Interspeech), 2023
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
DiffM
130
5
0
28 May 2023
Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model
Interspeech (Interspeech), 2023
Xiang Li
Songxiang Liu
Max W. Y. Lam
Zhiyong Wu
Chao Weng
Helen Meng
DiffM
188
5
0
26 May 2023
FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Ziyue Jiang
Qiang Yang
Jia-li Zuo
Zhe Ye
Rongjie Huang
Yixiang Ren
Zhou Zhao
DiffM
141
27
0
23 May 2023
RMSSinger: Realistic-Music-Score based Singing Voice Synthesis
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Jinzheng He
Jinglin Liu
Zhenhui Ye
Rongjie Huang
Chenye Cui
Huadai Liu
Zhou Zhao
DiffM
192
28
0
18 May 2023
AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Ruiqi Li
Rongjie Huang
Lichao Zhang
Jinglin Liu
Zhou Zhao
330
4
0
08 May 2023
Multilingual Multiaccented Multispeaker TTS with RADTTS
Rohan Badlani
Rafael Valle
Kevin J. Shih
J. F. Santos
Siddharth Gururani
Bryan Catanzaro
139
7
0
24 Jan 2023
Towards Building Text-To-Speech Systems for the Next Billion Users
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Gokul Karthik Kumar
V. PraveenS.
Pratyush Kumar
Mitesh M. Khapra
Karthik Nandakumar
175
27
0
17 Nov 2022
Grad-StyleSpeech: Any-speaker Adaptive Text-to-Speech Synthesis with Diffusion Models
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Minki Kang
Dong Min
Sung Ju Hwang
DiffM
257
61
0
17 Nov 2022
DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Kun Song
Yongmao Zhang
Yinjiao Lei
Jian Cong
Hanzhao Li
Linfu Xie
Gang He
Jinfeng Bai
150
22
0
02 Nov 2022
Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS
International Symposium on Chinese Spoken Language Processing (ISCSLP), 2022
Kun Song
Jian Cong
Xinsheng Wang
Yongmao Zhang
Linfu Xie
Ning Jiang
Haiying Wu
128
0
0
31 Oct 2022
Xiaoicesing 2: A High-Fidelity Singing Voice Synthesizer Based on Generative Adversarial Network
Interspeech (Interspeech), 2022
Chunhui Wang
Chang Zeng
Xing He
119
20
0
26 Oct 2022
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech
ACM Multimedia (ACM MM), 2022
Rongjie Huang
Zhou Zhao
Huadai Liu
Jinglin Liu
Chenye Cui
Yi Ren
DiffM
240
228
0
13 Jul 2022
A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Yisheng Xiao
Lijun Wu
Junliang Guo
Juntao Li
Hao Fei
Tao Qin
Tie-Yan Liu
3DV
MedIm
AI4CE
186
109
0
20 Apr 2022
Learn2Sing 2.0: Diffusion and Mutual Information-Based Target Speaker SVS by Learning from Singing Teacher
Interspeech (Interspeech), 2022
Heyang Xue
Xinsheng Wang
Yongmao Zhang
Lei Xie
Pengcheng Zhu
Mengxiao Bi
DiffM
117
14
0
30 Mar 2022
1