Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2006.03575
Cited By
v1
v2
v3 (latest)
End-to-End Adversarial Text-to-Speech
5 June 2020
Jeff Donahue
Sander Dieleman
Mikolaj Binkowski
Erich Elsen
Karen Simonyan
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"End-to-End Adversarial Text-to-Speech"
50 / 114 papers shown
Muyan-TTS: A Trainable Text-to-Speech Model Optimized for Podcast Scenarios with a $50K Budget
Xin Li
Kaikai Jia
Hao Sun
Jun Dai
Z. L. Jiang
933
3
0
27 Apr 2025
P2Mark: Plug-and-play Parameter-level Watermarking for Neural Speech Generation
Yong Ren
Jiangyan Yi
Tao Wang
Jianhua Tao
Zhengqi Wen
Chenxing Li
Zheng Lian
Ruibo Fu
Ye Bai
Xiaohui Zhang
439
0
0
07 Apr 2025
Memory-Centric Computing: Recent Advances in Processing-in-DRAM
O. Mutlu
Ataberk Olgun
Geraldo F. Oliveira
Ismail Emir Yüksel
366
11
0
26 Dec 2024
SegINR: Segment-wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-Speech
IEEE Signal Processing Letters (SPL), 2024
Minchan Kim
Myeonghun Jeong
Joun Yeop Lee
Nam Soo Kim
221
1
0
07 Oct 2024
SSDM: Scalable Speech Dysfluency Modeling
Neural Information Processing Systems (NeurIPS), 2024
Jiachen Lian
Xuanru Zhou
Z. Ezzes
Jet M J Vonk
Brittany Morin
D. Baquirin
Zachary Mille
M. G. Tempini
Gopala Anumanchipalli
AuLLM
332
22
0
29 Aug 2024
Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training
Hawraz A. Ahmad
Tarik A. Rashid
293
1
0
06 Aug 2024
A Survey of Deep Learning Audio Generation Methods
Matej Bozic
Marko Horvat
VLM
MedIm
351
9
0
31 May 2024
RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text
Jiaben Chen
Xin Yan
Yihang Chen
Siyuan Cen
Zixin Wang
Qinwei Ma
Haoyu Zhen
Kaizhi Qian
Lie Lu
Chuang Gan
357
3
0
30 May 2024
CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models
Xiang Li
Fan Bu
Ambuj Mehrish
Yingting Li
Jiale Han
Bo Cheng
Soujanya Poria
DiffM
182
12
0
31 Mar 2024
PAVITS: Exploring Prosody-aware VITS for End-to-End Emotional Voice Conversion
Tianhua Qi
Wenming Zheng
Cheng Lu
Yuan Zong
Hailun Lian
184
16
0
03 Mar 2024
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Spoken Language Technology Workshop (SLT), 2023
Xueyao Zhang
Liumeng Xue
Yicheng Gu
Yuancheng Wang
Haorui He
...
Mingxuan Wang
Jun Han
Kai Chen
Haizhou Li
Zhizheng Wu
366
65
0
15 Dec 2023
DQR-TTS: Semi-supervised Text-to-speech Synthesis with Dynamic Quantized Representation
Jiangzong Wang
Pengcheng Li
Xulong Zhang
Ning Cheng
Jing Xiao
385
0
0
14 Nov 2023
E3 TTS: Easy End-to-End Diffusion-based Text to Speech
Automatic Speech Recognition & Understanding (ASRU), 2023
Yuan Gao
Nobuyuki Morioka
Yu Zhang
Nanxin Chen
DiffM
357
48
0
02 Nov 2023
The IMS Toucan System for the Blizzard Challenge 2023
Florian Lux
Julia Koch
Sarina Meyer
Thomas Bott
Nadja Schauffler
Pavel Denisov
Antje Schweitzer
Ngoc Thang Vu
265
10
0
26 Oct 2023
DPP-TTS: Diversifying prosodic features of speech via determinantal point processes
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Seongho Joo
Hyukhun Koh
Kyomin Jung
DiffM
347
0
0
23 Oct 2023
An overview of text-to-speech systems and media applications
Mohammad Reza Hasanabadi
140
8
0
22 Oct 2023
DiffAR: Denoising Diffusion Autoregressive Model for Raw Speech Waveform Generation
International Conference on Learning Representations (ICLR), 2023
Roi Benita
Michael Elad
Joseph Keshet
DiffM
630
12
0
02 Oct 2023
FastGraphTTS: An Ultrafast Syntax-Aware Speech Synthesis Framework
IEEE International Conference on Tools with Artificial Intelligence (ICTAI), 2023
Jianzong Wang
Xulong Zhang
Aolan Sun
Ning Cheng
Jing Xiao
211
2
0
16 Sep 2023
QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning
IEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2023
Haohan Guo
Fenglong Xie
Jiawen Kang
Yujia Xiao
Xixin Wu
Helen M. Meng
191
4
0
31 Aug 2023
Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information
Interspeech (Interspeech), 2022
Shaohuan Zhou
Shunwei Lei
Weiya You
Deyi Tuo
Yuren You
Zhiyong Wu
Shiyin Kang
Helen Meng
272
4
0
31 Aug 2023
iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNN
Interspeech (Interspeech), 2023
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Shogo Seki
205
9
0
14 Aug 2023
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
Interspeech (Interspeech), 2023
Jungil Kong
Jihoon Park
Beomjeong Kim
Jeongmin Kim
Dohee Kong
Sangjin Kim
379
71
0
31 Jul 2023
eCat: An End-to-End Model for Multi-Speaker TTS & Many-to-Many Fine-Grained Prosody Transfer
Interspeech (Interspeech), 2023
Ammar Abbas
S. Karlapati
Bastian Schnell
Penny Karanasou
M. G. Moya
Amith Nagaraj
Ayman Boustati
Nicole Peinelt
Alexis Moinet
Thomas Drugman
346
3
0
20 Jun 2023
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Neural Information Processing Systems (NeurIPS), 2023
Yinghao Aaron Li
Cong Han
Vinay S. Raghavan
Gavin Mischler
N. Mesgarani
VLM
DiffM
370
241
0
13 Jun 2023
PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and Pause-based Prosody Modeling
Asian Conference on Pattern Recognition (ACPR), 2023
Ji-Sang Hwang
Sang-Hoon Lee
Seong-Whan Lee
242
5
0
13 Jun 2023
HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion Models
Neural Networks (Neural Netw.), 2023
Ji-Sang Hwang
Sang-Hoon Lee
Seong-Whan Lee
DiffM
228
22
0
12 Jun 2023
The Age of Synthetic Realities: Challenges and Opportunities
APSIPA Transactions on Signal and Information Processing (TASIP), 2023
J. P. Cardenuto
Jing Yang
Rafael Padilha
Renjie Wan
Daniel Moreira
Haoliang Li
Shiqi Wang
Fernanda A. Andaló
Sébastien Marcel
Anderson de Rezende Rocha
DeLMO
330
38
0
09 Jun 2023
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias
Ziyue Jiang
Yi Ren
Zhe Ye
Jinglin Liu
Chen Zhang
...
Rongjie Huang
Chunfeng Wang
Xiang Yin
Zejun Ma
Zhou Zhao
DiffM
306
99
0
06 Jun 2023
Towards Robust FastSpeech 2 by Modelling Residual Multimodality
Interspeech (Interspeech), 2023
Fabian Kögel
Bac Nguyen
Fabien Cardinaux
189
3
0
02 Jun 2023
OTW: Optimal Transport Warping for Time Series
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Fabian Latorre
Chenghao Liu
Doyen Sahoo
Guosheng Lin
OT
AI4TS
224
4
0
01 Jun 2023
DC CoMix TTS: An End-to-End Expressive TTS with Discrete Code Collaborated with Mixer
Interspeech (Interspeech), 2023
Yerin Choi
M. Koo
442
1
0
31 May 2023
Make-A-Voice: Unified Voice Synthesis With Discrete Representation
Rongjie Huang
Chunlei Zhang
Yongqiang Wang
Dongchao Yang
Lu Liu
Zhenhui Ye
Ziyue Jiang
Chao Weng
Zhou Zhao
Dong Yu
DiffM
224
36
0
30 May 2023
ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Huadai Liu
Rongjie Huang
Xuan Lin
Wenqiang Xu
Maozong Zheng
Hong Chen
Jinzheng He
Zhou Zhao
DiffM
403
32
0
22 May 2023
CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Zhe Ye
Rongjie Huang
Yi Ren
Ziyue Jiang
Jinglin Liu
Jinzheng He
Xiang Yin
Zhou Zhao
CLIP
182
29
0
18 May 2023
RMSSinger: Realistic-Music-Score based Singing Voice Synthesis
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Jinzheng He
Jinglin Liu
Zhenhui Ye
Rongjie Huang
Chenye Cui
Huadai Liu
Zhou Zhao
DiffM
293
31
0
18 May 2023
Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Shogo Seki
221
12
0
24 Mar 2023
A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI
Chenshuang Zhang
Chaoning Zhang
Sheng Zheng
Mengchun Zhang
Maryam Qamar
Sung-Ho Bae
In So Kweon
DiffM
MedIm
339
110
0
23 Mar 2023
An End-to-End Neural Network for Image-to-Audio Transformation
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Liu Chen
Michael Deisher
Munir Georges
193
5
0
10 Mar 2023
FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model
Rui Xue
Yanqing Liu
Lei He
Xuejiao Tan
Linquan Liu
Ed Lin
Sheng Zhao
372
9
0
06 Mar 2023
PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS
Junhyeok Lee
Wonbin Jung
Hyunjae Cho
Jaeyeon Kim
Jaehwan Kim
435
5
0
24 Feb 2023
Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study
Massa Baali
Tomoki Hayashi
Hamdy Mubarak
Soumi Maiti
Shinji Watanabe
W. El-Hajj
Ahmed M. Ali
237
12
0
22 Jan 2023
SNAC: Speaker-normalized affine coupling layer in flow-based architecture for zero-shot multi-speaker text-to-speech
IEEE Signal Processing Letters (SPL), 2022
Byoung Jin Choi
Myeonghun Jeong
Joun Yeop Lee
N. Kim
223
16
0
30 Nov 2022
Deep Fake Detection, Deterrence and Response: Challenges and Opportunities
Amin Azmoodeh
Ali Dehghantanha
219
4
0
26 Nov 2022
NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis
International Conference on Learning Representations (ICLR), 2022
Hyeong-Seok Choi
Jinhyeok Yang
Juheon Lee
Hyeongju Kim
363
57
0
17 Nov 2022
DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Kun Song
Yongmao Zhang
Yinjiao Lei
Jian Cong
Hanzhao Li
Linfu Xie
Gang He
Jinfeng Bai
269
26
0
02 Nov 2022
Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis
Karolos Nikitaras
Konstantinos Klapsas
Nikolaos Ellinas
Georgia Maniati
June Sig Sung
Inchul Hwang
S. Raptis
Aimilios Chalamandaris
Pirros Tsiakoulis
262
1
0
01 Nov 2022
Uncertainty-DTW for Time Series and Sequences
European Conference on Computer Vision (ECCV), 2022
Lei Wang
Piotr Koniusz
347
47
0
30 Oct 2022
Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech Synthesis
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Yuma Shirahata
Ryuichi Yamamoto
Eunwoo Song
Ryo Terashima
Jae-Min Kim
Kentaro Tachibana
299
19
0
28 Oct 2022
Improving robustness of spontaneous speech synthesis with linguistic speech regularization and pseudo-filled-pause insertion
Speech Synthesis Workshop (SSW), 2022
Yuta Matsunaga
Takaaki Saeki
Shinnosuke Takamichi
Hiroshi Saruwatari
343
2
0
18 Oct 2022
Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis
Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2022
Yuta Matsunaga
Takaaki Saeki
Shinnosuke Takamichi
Hiroshi Saruwatari
295
3
0
14 Oct 2022
1
2
3
Next
Page 1 of 3