ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.13066
  4. Cited By
Revisiting Over-Smoothness in Text to Speech

Revisiting Over-Smoothness in Text to Speech

Annual Meeting of the Association for Computational Linguistics (ACL), 2022
26 February 2022
Yi Ren
Xu Tan
Tao Qin
Zhou Zhao
Tie-Yan Liu
ArXiv (abs)PDFHTML

Papers citing "Revisiting Over-Smoothness in Text to Speech"

37 / 37 papers shown
Title
Rethinking Long-tailed Dataset Distillation: A Uni-Level Framework with Unbiased Recovery and Relabeling
Rethinking Long-tailed Dataset Distillation: A Uni-Level Framework with Unbiased Recovery and Relabeling
Xiao Cui
Yulei Qin
Xinyue Li
Wengang Zhou
Hongsheng Li
Houqiang Li
DDFedML
241
0
0
24 Nov 2025
ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
Han Zhu
Wei Kang
Zengwei Yao
Liyong Guo
Fangjun Kuang
Zhaoqing Li
Weiji Zhuang
Long Lin
Daniel Povey
243
8
0
16 Jun 2025
Instance-Specific Test-Time Training for Speech Editing in the Wild
Instance-Specific Test-Time Training for Speech Editing in the Wild
Taewoo Kim
Uijong Lee
H. Park
Choongsang Cho
Nam In Park
Young Han Lee
146
0
0
16 Jun 2025
BemaGANv2: A Tutorial and Comparative Survey of GAN-based Vocoders for Long-Term Audio Generation
BemaGANv2: A Tutorial and Comparative Survey of GAN-based Vocoders for Long-Term Audio Generation
Taesoo Park
Mungwi Jeong
Mingyu Park
Narae Kim
Junyoung Kim
Mujung Kim
Jisang Yoo
Hoyun Lee
Sanghoon Kim
Soonchul Kwon
117
0
0
11 Jun 2025
NaturalL2S: End-to-End High-quality Multispeaker Lip-to-Speech Synthesis with Differential Digital Signal Processing
NaturalL2S: End-to-End High-quality Multispeaker Lip-to-Speech Synthesis with Differential Digital Signal ProcessingNeural Networks (NN), 2025
Yifan Liang
Fangkun Liu
Andong Li
Xiaodong Li
C. Zheng
252
2
0
17 Feb 2025
FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching
FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching
Hui Wang
Shujie Liu
Lingwei Meng
Jiajian Li
Yifan Yang
...
Yanqing Liu
Haoqin Sun
Jiaming Zhou
Yan Lu
Yong Qin
238
11
0
16 Feb 2025
KALL-E:Autoregressive Speech Synthesis with Next-Distribution Prediction
KALL-E:Autoregressive Speech Synthesis with Next-Distribution Prediction
Kangxiang Xia
Xinfa Zhu
Lei Xie
WenJie Tian
W. Li
Lei Xie
VLM
360
0
0
22 Dec 2024
Lina-Speech: Gated Linear Attention and Initial-State Tuning for Multi-Sample Prompting Text-To-Speech Synthesis
Lina-Speech: Gated Linear Attention and Initial-State Tuning for Multi-Sample Prompting Text-To-Speech Synthesis
Théodor Lemerle
Harrison Vanderbyl
Vaibhav Srivastav
Nicolas Obin
132
4
0
30 Oct 2024
Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models
Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models
Sijing Chen
Qi Liu
Laipeng He
Tianwei He
Wendi He
...
Huimin Zhang
Xiang Zhang
Guangcheng Zhao
Hongbin Zhou
Pengpeng Zou
212
12
0
18 Sep 2024
Acquiring Pronunciation Knowledge from Transcribed Speech Audio via
  Multi-task Learning
Acquiring Pronunciation Knowledge from Transcribed Speech Audio via Multi-task Learning
Siqi Sun
Korin Richmond
254
0
0
15 Sep 2024
Improving Robustness of Diffusion-Based Zero-Shot Speech Synthesis via Stable Formant Generation
Improving Robustness of Diffusion-Based Zero-Shot Speech Synthesis via Stable Formant GenerationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
C. Han
Seokgi Lee
Gyuhyeon Nam
Gyeongsu Chae
DiffM
951
0
0
14 Sep 2024
Central Kurdish Text-to-Speech Synthesis with Novel End-to-End
  Transformer Training
Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training
Hawraz A. Ahmad
Tarik A. Rashid
185
1
0
06 Aug 2024
Fake it to make it: Using synthetic data to remedy the data shortage in
  joint multimodal speech-and-gesture synthesis
Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis
Shivam Mehta
Anna Deichler
Jim O'Regan
Birger Moëll
Jonas Beskow
G. Henter
Simon Alexanderson
200
7
0
30 Apr 2024
FlashSpeech: Efficient Zero-Shot Speech Synthesis
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Zhen Ye
Zeqian Ju
Haohe Liu
Xu Tan
Jianyi Chen
...
Weizhen Bian
Shulin He
Qi-fei Liu
Yi-Ting Guo
Wei Xue
230
30
0
23 Apr 2024
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model
  on 100K hours of data
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
Mateusz Lajszczak
Guillermo Cámbara
Yang Li
Fatih Beyhan
Arent van Korlaar
...
Bartosz Putrycz
Soledad López Gambino
Kayeon Yoo
Elena Sokolova
Thomas Drugman
LM&MA
298
109
0
12 Feb 2024
StreamVoice: Streamable Context-Aware Language Modeling for Real-time
  Zero-Shot Voice Conversion
StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice ConversionAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Zhichao Wang
Yuan-Jui Chen
Xinsheng Wang
Lei Xie
Yuping Wang
277
12
0
19 Jan 2024
FluentEditor: Text-based Speech Editing by Considering Acoustic and
  Prosody Consistency
FluentEditor: Text-based Speech Editing by Considering Acoustic and Prosody ConsistencyInterspeech (Interspeech), 2023
Rui Liu
Jiatian Xi
Ziyue Jiang
Haizhou Li
285
7
0
21 Sep 2023
Let There Be Sound: Reconstructing High Quality Speech from Silent
  Videos
Let There Be Sound: Reconstructing High Quality Speech from Silent VideosAAAI Conference on Artificial Intelligence (AAAI), 2023
Ji-Hoon Kim
Jaehun Kim
Joon Son Chung
219
10
0
29 Aug 2023
DiffProsody: Diffusion-based Latent Prosody Generation for Expressive
  Speech Synthesis with Prosody Conditional Adversarial Training
DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial TrainingIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
H. Oh
Sang-Hoon Lee
Seong-Whan Lee
DiffM
234
26
0
31 Jul 2023
SCRAPS: Speech Contrastive Representations of Acoustic and Phonetic
  Spaces
SCRAPS: Speech Contrastive Representations of Acoustic and Phonetic SpacesEuropean Conference on Artificial Intelligence (ECAI), 2023
Iván Vallés-Pérez
Grzegorz Beringer
Piotr Bilinski
G. Cook
Roberto Barra-Chicote
122
1
0
23 Jul 2023
PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and
  Pause-based Prosody Modeling
PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and Pause-based Prosody ModelingAsian Conference on Pattern Recognition (ACPR), 2023
Ji-Sang Hwang
Sang-Hoon Lee
Seong-Whan Lee
150
4
0
13 Jun 2023
Interpretable Style Transfer for Text-to-Speech with ControlVAE and
  Diffusion Bridge
Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion BridgeInterspeech (Interspeech), 2023
Wenhao Guan
Tao Li
Yishuang Li
Hukai Huang
Q. Hong
Lin Li
DiffM
139
6
0
07 Jun 2023
Towards Robust FastSpeech 2 by Modelling Residual Multimodality
Towards Robust FastSpeech 2 by Modelling Residual MultimodalityInterspeech (Interspeech), 2023
Fabian Kögel
Bac Nguyen
Fabien Cardinaux
116
3
0
02 Jun 2023
Stochastic Pitch Prediction Improves the Diversity and Naturalness of
  Speech in Glow-TTS
Stochastic Pitch Prediction Improves the Diversity and Naturalness of Speech in Glow-TTSInterspeech (Interspeech), 2023
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
DiffM
130
5
0
28 May 2023
Diverse and Expressive Speech Prosody Prediction with Denoising
  Diffusion Probabilistic Model
Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic ModelInterspeech (Interspeech), 2023
Xiang Li
Songxiang Liu
Max W. Y. Lam
Zhiyong Wu
Chao Weng
Helen Meng
DiffM
188
5
0
26 May 2023
FluentSpeech: Stutter-Oriented Automatic Speech Editing with
  Context-Aware Diffusion Models
FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Ziyue Jiang
Qiang Yang
Jia-li Zuo
Zhe Ye
Rongjie Huang
Yixiang Ren
Zhou Zhao
DiffM
141
27
0
23 May 2023
RMSSinger: Realistic-Music-Score based Singing Voice Synthesis
RMSSinger: Realistic-Music-Score based Singing Voice SynthesisAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Jinzheng He
Jinglin Liu
Zhenhui Ye
Rongjie Huang
Chenye Cui
Huadai Liu
Zhou Zhao
DiffM
192
28
0
18 May 2023
AlignSTS: Speech-to-Singing Conversion via Cross-Modal Alignment
AlignSTS: Speech-to-Singing Conversion via Cross-Modal AlignmentAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Ruiqi Li
Rongjie Huang
Lichao Zhang
Jinglin Liu
Zhou Zhao
330
4
0
08 May 2023
Multilingual Multiaccented Multispeaker TTS with RADTTS
Multilingual Multiaccented Multispeaker TTS with RADTTS
Rohan Badlani
Rafael Valle
Kevin J. Shih
J. F. Santos
Siddharth Gururani
Bryan Catanzaro
139
7
0
24 Jan 2023
Towards Building Text-To-Speech Systems for the Next Billion Users
Towards Building Text-To-Speech Systems for the Next Billion UsersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Gokul Karthik Kumar
V. PraveenS.
Pratyush Kumar
Mitesh M. Khapra
Karthik Nandakumar
175
27
0
17 Nov 2022
Grad-StyleSpeech: Any-speaker Adaptive Text-to-Speech Synthesis with
  Diffusion Models
Grad-StyleSpeech: Any-speaker Adaptive Text-to-Speech Synthesis with Diffusion ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Minki Kang
Dong Min
Sung Ju Hwang
DiffM
257
61
0
17 Nov 2022
DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by
  time-frequency domain supervision from DSP
DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSPIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Kun Song
Yongmao Zhang
Yinjiao Lei
Jian Cong
Hanzhao Li
Linfu Xie
Gang He
Jinfeng Bai
150
22
0
02 Nov 2022
Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS
Robust MelGAN: A robust universal neural vocoder for high-fidelity TTSInternational Symposium on Chinese Spoken Language Processing (ISCSLP), 2022
Kun Song
Jian Cong
Xinsheng Wang
Yongmao Zhang
Linfu Xie
Ning Jiang
Haiying Wu
128
0
0
31 Oct 2022
Xiaoicesing 2: A High-Fidelity Singing Voice Synthesizer Based on
  Generative Adversarial Network
Xiaoicesing 2: A High-Fidelity Singing Voice Synthesizer Based on Generative Adversarial NetworkInterspeech (Interspeech), 2022
Chunhui Wang
Chang Zeng
Xing He
119
20
0
26 Oct 2022
ProDiff: Progressive Fast Diffusion Model For High-Quality
  Text-to-Speech
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-SpeechACM Multimedia (ACM MM), 2022
Rongjie Huang
Zhou Zhao
Huadai Liu
Jinglin Liu
Chenye Cui
Yi Ren
DiffM
240
228
0
13 Jul 2022
A Survey on Non-Autoregressive Generation for Neural Machine Translation
  and Beyond
A Survey on Non-Autoregressive Generation for Neural Machine Translation and BeyondIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Yisheng Xiao
Lijun Wu
Junliang Guo
Juntao Li
Hao Fei
Tao Qin
Tie-Yan Liu
3DVMedImAI4CE
186
109
0
20 Apr 2022
Learn2Sing 2.0: Diffusion and Mutual Information-Based Target Speaker
  SVS by Learning from Singing Teacher
Learn2Sing 2.0: Diffusion and Mutual Information-Based Target Speaker SVS by Learning from Singing TeacherInterspeech (Interspeech), 2022
Heyang Xue
Xinsheng Wang
Yongmao Zhang
Lei Xie
Pengcheng Zhu
Mengxiao Bi
DiffM
117
14
0
30 Mar 2022
1