Revisiting Over-Smoothness in Text to Speech

Annual Meeting of the Association for Computational Linguistics (ACL), 2022

26 February 2022

Xu Tan

Zhou Zhao

Papers citing "Revisiting Over-Smoothness in Text to Speech"

37 / 37 papers shown

Title
Rethinking Long-tailed Dataset Distillation: A Uni-Level Framework with Unbiased Recovery and Relabeling Xiao Cui Yulei Qin Xinyue Li Wengang Zhou Hongsheng Li Houqiang Li DD FedML 241 0 0 24 Nov 2025
ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching Han Zhu Wei Kang Zengwei Yao Liyong Guo Fangjun Kuang Zhaoqing Li Weiji Zhuang Long Lin Daniel Povey 243 8 0 16 Jun 2025
Instance-Specific Test-Time Training for Speech Editing in the Wild Taewoo Kim Uijong Lee H. Park Choongsang Cho Nam In Park Young Han Lee 146 0 0 16 Jun 2025
BemaGANv2: A Tutorial and Comparative Survey of GAN-based Vocoders for Long-Term Audio Generation Taesoo Park Mungwi Jeong Mingyu Park Narae Kim Junyoung Kim Mujung Kim Jisang Yoo Hoyun Lee Sanghoon Kim Soonchul Kwon 117 0 0 11 Jun 2025
NaturalL2S: End-to-End High-quality Multispeaker Lip-to-Speech Synthesis with Differential Digital Signal ProcessingNeural Networks (NN), 2025 Yifan Liang Fangkun Liu Andong Li Xiaodong Li C. Zheng 252 2 0 17 Feb 2025
FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching Hui Wang Shujie Liu Lingwei Meng Jiajian Li Yifan Yang ... Yanqing Liu Haoqin Sun Jiaming Zhou Yan Lu Yong Qin 238 11 0 16 Feb 2025
KALL-E:Autoregressive Speech Synthesis with Next-Distribution Prediction Kangxiang Xia Xinfa Zhu Lei Xie WenJie Tian W. Li Lei Xie VLM 360 0 0 22 Dec 2024
Lina-Speech: Gated Linear Attention and Initial-State Tuning for Multi-Sample Prompting Text-To-Speech Synthesis Théodor Lemerle Harrison Vanderbyl Vaibhav Srivastav Nicolas Obin 132 4 0 30 Oct 2024
Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models Sijing Chen Qi Liu Laipeng He Tianwei He Wendi He ... Huimin Zhang Xiang Zhang Guangcheng Zhao Hongbin Zhou Pengpeng Zou 212 12 0 18 Sep 2024
Acquiring Pronunciation Knowledge from Transcribed Speech Audio via Multi-task Learning Siqi Sun Korin Richmond 254 0 0 15 Sep 2024
Improving Robustness of Diffusion-Based Zero-Shot Speech Synthesis via Stable Formant GenerationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024 C. Han Seokgi Lee Gyuhyeon Nam Gyeongsu Chae DiffM 951 0 0 14 Sep 2024
Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training Hawraz A. Ahmad Tarik A. Rashid 185 1 0 06 Aug 2024
Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis Shivam Mehta Anna Deichler Jim O'Regan Birger Moëll Jonas Beskow G. Henter Simon Alexanderson 200 7 0 30 Apr 2024
FlashSpeech: Efficient Zero-Shot Speech Synthesis Zhen Ye Zeqian Ju Haohe Liu Xu Tan Jianyi Chen ... Weizhen Bian Shulin He Qi-fei Liu Yi-Ting Guo Wei Xue 230 30 0 23 Apr 2024
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data Mateusz Lajszczak Guillermo Cámbara Yang Li Fatih Beyhan Arent van Korlaar ... Bartosz Putrycz Soledad López Gambino Kayeon Yoo Elena Sokolova Thomas Drugman LM&MA 298 109 0 12 Feb 2024
StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice ConversionAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 Zhichao Wang Yuan-Jui Chen Xinsheng Wang Lei Xie Yuping Wang 277 12 0 19 Jan 2024
FluentEditor: Text-based Speech Editing by Considering Acoustic and Prosody ConsistencyInterspeech (Interspeech), 2023 Rui Liu Jiatian Xi Ziyue Jiang Haizhou Li 285 7 0 21 Sep 2023
Let There Be Sound: Reconstructing High Quality Speech from Silent VideosAAAI Conference on Artificial Intelligence (AAAI), 2023 Ji-Hoon Kim Jaehun Kim Joon Son Chung 219 10 0 29 Aug 2023
DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial TrainingIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023 H. Oh Sang-Hoon Lee Seong-Whan Lee DiffM 234 26 0 31 Jul 2023
SCRAPS: Speech Contrastive Representations of Acoustic and Phonetic SpacesEuropean Conference on Artificial Intelligence (ECAI), 2023 Iván Vallés-Pérez Grzegorz Beringer Piotr Bilinski G. Cook Roberto Barra-Chicote 122 1 0 23 Jul 2023
PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and Pause-based Prosody ModelingAsian Conference on Pattern Recognition (ACPR), 2023 Ji-Sang Hwang Sang-Hoon Lee Seong-Whan Lee 150 4 0 13 Jun 2023
Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion BridgeInterspeech (Interspeech), 2023 Wenhao Guan Tao Li Yishuang Li Hukai Huang Q. Hong Lin Li DiffM 139 6 0 07 Jun 2023
Towards Robust FastSpeech 2 by Modelling Residual MultimodalityInterspeech (Interspeech), 2023 Fabian Kögel Bac Nguyen Fabien Cardinaux 116 3 0 02 Jun 2023
Stochastic Pitch Prediction Improves the Diversity and Naturalness of Speech in Glow-TTSInterspeech (Interspeech), 2023 Sewade Ogun Vincent Colotte Emmanuel Vincent DiffM 130 5 0 28 May 2023
Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic ModelInterspeech (Interspeech), 2023 Xiang Li Songxiang Liu Max W. Y. Lam Zhiyong Wu Chao Weng Helen Meng DiffM 188 5 0 26 May 2023
FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023 Ziyue Jiang Qiang Yang Jia-li Zuo Zhe Ye Rongjie Huang Yixiang Ren Zhou Zhao DiffM 141 27 0 23 May 2023
RMSSinger: Realistic-Music-Score based Singing Voice SynthesisAnnual Meeting of the Association for Computational Linguistics (ACL), 2023 Jinzheng He Jinglin Liu Zhenhui Ye Rongjie Huang Chenye Cui Huadai Liu Zhou Zhao DiffM 192 28 0 18 May 2023
AlignSTS: Speech-to-Singing Conversion via Cross-Modal AlignmentAnnual Meeting of the Association for Computational Linguistics (ACL), 2023 Ruiqi Li Rongjie Huang Lichao Zhang Jinglin Liu Zhou Zhao 330 4 0 08 May 2023
Multilingual Multiaccented Multispeaker TTS with RADTTS Rohan Badlani Rafael Valle Kevin J. Shih J. F. Santos Siddharth Gururani Bryan Catanzaro 139 7 0 24 Jan 2023
Towards Building Text-To-Speech Systems for the Next Billion UsersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022 Gokul Karthik Kumar V. PraveenS. Pratyush Kumar Mitesh M. Khapra Karthik Nandakumar 175 27 0 17 Nov 2022
Grad-StyleSpeech: Any-speaker Adaptive Text-to-Speech Synthesis with Diffusion ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022 Minki Kang Dong Min Sung Ju Hwang DiffM 257 61 0 17 Nov 2022
DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSPIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022 Kun Song Yongmao Zhang Yinjiao Lei Jian Cong Hanzhao Li Linfu Xie Gang He Jinfeng Bai 150 22 0 02 Nov 2022
Robust MelGAN: A robust universal neural vocoder for high-fidelity TTSInternational Symposium on Chinese Spoken Language Processing (ISCSLP), 2022 Kun Song Jian Cong Xinsheng Wang Yongmao Zhang Linfu Xie Ning Jiang Haiying Wu 128 0 0 31 Oct 2022
Xiaoicesing 2: A High-Fidelity Singing Voice Synthesizer Based on Generative Adversarial NetworkInterspeech (Interspeech), 2022 Chunhui Wang Chang Zeng Xing He 119 20 0 26 Oct 2022
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-SpeechACM Multimedia (ACM MM), 2022 Rongjie Huang Zhou Zhao Huadai Liu Jinglin Liu Chenye Cui Yi Ren DiffM 240 228 0 13 Jul 2022
A Survey on Non-Autoregressive Generation for Neural Machine Translation and BeyondIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022 Yisheng Xiao Lijun Wu Junliang Guo Juntao Li Hao Fei Tao Qin Tie-Yan Liu 3DV MedIm AI4CE 186 109 0 20 Apr 2022
Learn2Sing 2.0: Diffusion and Mutual Information-Based Target Speaker SVS by Learning from Singing TeacherInterspeech (Interspeech), 2022 Heyang Xue Xinsheng Wang Yongmao Zhang Lei Xie Pengcheng Zhu Mengxiao Bi DiffM 117 14 0 30 Mar 2022