ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.03575
  4. Cited By
End-to-End Adversarial Text-to-Speech
v1v2v3 (latest)

End-to-End Adversarial Text-to-Speech

5 June 2020
Jeff Donahue
Sander Dieleman
Mikolaj Binkowski
Erich Elsen
Karen Simonyan
ArXiv (abs)PDFHTML

Papers citing "End-to-End Adversarial Text-to-Speech"

50 / 114 papers shown
Muyan-TTS: A Trainable Text-to-Speech Model Optimized for Podcast Scenarios with a $50K Budget
Muyan-TTS: A Trainable Text-to-Speech Model Optimized for Podcast Scenarios with a $50K Budget
Xin Li
Kaikai Jia
Hao Sun
Jun Dai
Z. L. Jiang
933
3
0
27 Apr 2025
P2Mark: Plug-and-play Parameter-level Watermarking for Neural Speech Generation
P2Mark: Plug-and-play Parameter-level Watermarking for Neural Speech Generation
Yong Ren
Jiangyan Yi
Tao Wang
Jianhua Tao
Zhengqi Wen
Chenxing Li
Zheng Lian
Ruibo Fu
Ye Bai
Xiaohui Zhang
439
0
0
07 Apr 2025
Memory-Centric Computing: Recent Advances in Processing-in-DRAM
Memory-Centric Computing: Recent Advances in Processing-in-DRAM
O. Mutlu
Ataberk Olgun
Geraldo F. Oliveira
Ismail Emir Yüksel
366
11
0
26 Dec 2024
SegINR: Segment-wise Implicit Neural Representation for Sequence
  Alignment in Neural Text-to-Speech
SegINR: Segment-wise Implicit Neural Representation for Sequence Alignment in Neural Text-to-SpeechIEEE Signal Processing Letters (SPL), 2024
Minchan Kim
Myeonghun Jeong
Joun Yeop Lee
Nam Soo Kim
221
1
0
07 Oct 2024
SSDM: Scalable Speech Dysfluency Modeling
SSDM: Scalable Speech Dysfluency ModelingNeural Information Processing Systems (NeurIPS), 2024
Jiachen Lian
Xuanru Zhou
Z. Ezzes
Jet M J Vonk
Brittany Morin
D. Baquirin
Zachary Mille
M. G. Tempini
Gopala Anumanchipalli
AuLLM
332
22
0
29 Aug 2024
Central Kurdish Text-to-Speech Synthesis with Novel End-to-End
  Transformer Training
Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training
Hawraz A. Ahmad
Tarik A. Rashid
293
1
0
06 Aug 2024
A Survey of Deep Learning Audio Generation Methods
A Survey of Deep Learning Audio Generation Methods
Matej Bozic
Marko Horvat
VLMMedIm
351
9
0
31 May 2024
RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text
RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text
Jiaben Chen
Xin Yan
Yihang Chen
Siyuan Cen
Zixin Wang
Qinwei Ma
Haoyu Zhen
Kaizhi Qian
Lie Lu
Chuang Gan
357
3
0
30 May 2024
CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through
  Weighted Samplers and Consistency Models
CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models
Xiang Li
Fan Bu
Ambuj Mehrish
Yingting Li
Jiale Han
Bo Cheng
Soujanya Poria
DiffM
182
12
0
31 Mar 2024
PAVITS: Exploring Prosody-aware VITS for End-to-End Emotional Voice
  Conversion
PAVITS: Exploring Prosody-aware VITS for End-to-End Emotional Voice Conversion
Tianhua Qi
Wenming Zheng
Cheng Lu
Yuan Zong
Hailun Lian
184
16
0
03 Mar 2024
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Amphion: An Open-Source Audio, Music and Speech Generation ToolkitSpoken Language Technology Workshop (SLT), 2023
Xueyao Zhang
Liumeng Xue
Yicheng Gu
Yuancheng Wang
Haorui He
...
Mingxuan Wang
Jun Han
Kai Chen
Haizhou Li
Zhizheng Wu
366
65
0
15 Dec 2023
DQR-TTS: Semi-supervised Text-to-speech Synthesis with Dynamic Quantized
  Representation
DQR-TTS: Semi-supervised Text-to-speech Synthesis with Dynamic Quantized Representation
Jiangzong Wang
Pengcheng Li
Xulong Zhang
Ning Cheng
Jing Xiao
385
0
0
14 Nov 2023
E3 TTS: Easy End-to-End Diffusion-based Text to Speech
E3 TTS: Easy End-to-End Diffusion-based Text to SpeechAutomatic Speech Recognition & Understanding (ASRU), 2023
Yuan Gao
Nobuyuki Morioka
Yu Zhang
Nanxin Chen
DiffM
357
48
0
02 Nov 2023
The IMS Toucan System for the Blizzard Challenge 2023
The IMS Toucan System for the Blizzard Challenge 2023
Florian Lux
Julia Koch
Sarina Meyer
Thomas Bott
Nadja Schauffler
Pavel Denisov
Antje Schweitzer
Ngoc Thang Vu
265
10
0
26 Oct 2023
DPP-TTS: Diversifying prosodic features of speech via determinantal
  point processes
DPP-TTS: Diversifying prosodic features of speech via determinantal point processesConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Seongho Joo
Hyukhun Koh
Kyomin Jung
DiffM
347
0
0
23 Oct 2023
An overview of text-to-speech systems and media applications
An overview of text-to-speech systems and media applications
Mohammad Reza Hasanabadi
140
8
0
22 Oct 2023
DiffAR: Denoising Diffusion Autoregressive Model for Raw Speech Waveform
  Generation
DiffAR: Denoising Diffusion Autoregressive Model for Raw Speech Waveform GenerationInternational Conference on Learning Representations (ICLR), 2023
Roi Benita
Michael Elad
Joseph Keshet
DiffM
630
12
0
02 Oct 2023
FastGraphTTS: An Ultrafast Syntax-Aware Speech Synthesis Framework
FastGraphTTS: An Ultrafast Syntax-Aware Speech Synthesis FrameworkIEEE International Conference on Tools with Artificial Intelligence (ICTAI), 2023
Jianzong Wang
Xulong Zhang
Aolan Sun
Ning Cheng
Jing Xiao
211
2
0
16 Sep 2023
QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via
  Vector-Quantized Self-Supervised Speech Representation Learning
QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation LearningIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2023
Haohan Guo
Fenglong Xie
Jiawen Kang
Yujia Xiao
Xixin Wu
Helen M. Meng
191
4
0
31 Aug 2023
Towards Improving the Expressiveness of Singing Voice Synthesis with
  BERT Derived Semantic Information
Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic InformationInterspeech (Interspeech), 2022
Shaohuan Zhou
Shunwei Lei
Weiya You
Deyi Tuo
Yuren You
Zhiyong Wu
Shiyin Kang
Helen Meng
272
4
0
31 Aug 2023
iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using
  1D-2D CNN
iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNNInterspeech (Interspeech), 2023
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Shogo Seki
205
9
0
14 Aug 2023
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech
  with Adversarial Learning and Architecture Design
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture DesignInterspeech (Interspeech), 2023
Jungil Kong
Jihoon Park
Beomjeong Kim
Jeongmin Kim
Dohee Kong
Sangjin Kim
379
71
0
31 Jul 2023
eCat: An End-to-End Model for Multi-Speaker TTS & Many-to-Many
  Fine-Grained Prosody Transfer
eCat: An End-to-End Model for Multi-Speaker TTS & Many-to-Many Fine-Grained Prosody TransferInterspeech (Interspeech), 2023
Ammar Abbas
S. Karlapati
Bastian Schnell
Penny Karanasou
M. G. Moya
Amith Nagaraj
Ayman Boustati
Nicole Peinelt
Alexis Moinet
Thomas Drugman
346
3
0
20 Jun 2023
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion
  and Adversarial Training with Large Speech Language Models
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language ModelsNeural Information Processing Systems (NeurIPS), 2023
Yinghao Aaron Li
Cong Han
Vinay S. Raghavan
Gavin Mischler
N. Mesgarani
VLMDiffM
370
241
0
13 Jun 2023
PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and
  Pause-based Prosody Modeling
PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and Pause-based Prosody ModelingAsian Conference on Pattern Recognition (ACPR), 2023
Ji-Sang Hwang
Sang-Hoon Lee
Seong-Whan Lee
242
5
0
13 Jun 2023
HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio
  Codec and Latent Diffusion Models
HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion ModelsNeural Networks (Neural Netw.), 2023
Ji-Sang Hwang
Sang-Hoon Lee
Seong-Whan Lee
DiffM
228
22
0
12 Jun 2023
The Age of Synthetic Realities: Challenges and Opportunities
The Age of Synthetic Realities: Challenges and OpportunitiesAPSIPA Transactions on Signal and Information Processing (TASIP), 2023
J. P. Cardenuto
Jing Yang
Rafael Padilha
Renjie Wan
Daniel Moreira
Haoliang Li
Shiqi Wang
Fernanda A. Andaló
Sébastien Marcel
Anderson de Rezende Rocha
DeLMO
330
38
0
09 Jun 2023
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive
  Bias
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias
Ziyue Jiang
Yi Ren
Zhe Ye
Jinglin Liu
Chen Zhang
...
Rongjie Huang
Chunfeng Wang
Xiang Yin
Zejun Ma
Zhou Zhao
DiffM
306
99
0
06 Jun 2023
Towards Robust FastSpeech 2 by Modelling Residual Multimodality
Towards Robust FastSpeech 2 by Modelling Residual MultimodalityInterspeech (Interspeech), 2023
Fabian Kögel
Bac Nguyen
Fabien Cardinaux
189
3
0
02 Jun 2023
OTW: Optimal Transport Warping for Time Series
OTW: Optimal Transport Warping for Time SeriesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Fabian Latorre
Chenghao Liu
Doyen Sahoo
Guosheng Lin
OTAI4TS
224
4
0
01 Jun 2023
DC CoMix TTS: An End-to-End Expressive TTS with Discrete Code
  Collaborated with Mixer
DC CoMix TTS: An End-to-End Expressive TTS with Discrete Code Collaborated with MixerInterspeech (Interspeech), 2023
Yerin Choi
M. Koo
442
1
0
31 May 2023
Make-A-Voice: Unified Voice Synthesis With Discrete Representation
Make-A-Voice: Unified Voice Synthesis With Discrete Representation
Rongjie Huang
Chunlei Zhang
Yongqiang Wang
Dongchao Yang
Lu Liu
Zhenhui Ye
Ziyue Jiang
Chao Weng
Zhou Zhao
Dong Yu
DiffM
224
36
0
30 May 2023
ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
ViT-TTS: Visual Text-to-Speech with Scalable Diffusion TransformerConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Huadai Liu
Rongjie Huang
Xuan Lin
Wenqiang Xu
Maozong Zheng
Hong Chen
Jinzheng He
Zhou Zhao
DiffM
403
32
0
22 May 2023
CLAPSpeech: Learning Prosody from Text Context with Contrastive
  Language-Audio Pre-training
CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-trainingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Zhe Ye
Rongjie Huang
Yi Ren
Ziyue Jiang
Jinglin Liu
Jinzheng He
Xiang Yin
Zhou Zhao
CLIP
182
29
0
18 May 2023
RMSSinger: Realistic-Music-Score based Singing Voice Synthesis
RMSSinger: Realistic-Music-Score based Singing Voice SynthesisAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Jinzheng He
Jinglin Liu
Zhenhui Ye
Rongjie Huang
Chenye Cui
Huadai Liu
Zhou Zhao
DiffM
293
31
0
18 May 2023
Wave-U-Net Discriminator: Fast and Lightweight Discriminator for
  Generative Adversarial Network-Based Speech Synthesis
Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech SynthesisIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Shogo Seki
221
12
0
24 Mar 2023
A Survey on Audio Diffusion Models: Text To Speech Synthesis and
  Enhancement in Generative AI
A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI
Chenshuang Zhang
Chaoning Zhang
Sheng Zheng
Mengchun Zhang
Maryam Qamar
Sung-Ho Bae
In So Kweon
DiffMMedIm
339
110
0
23 Mar 2023
An End-to-End Neural Network for Image-to-Audio Transformation
An End-to-End Neural Network for Image-to-Audio TransformationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Liu Chen
Michael Deisher
Munir Georges
193
5
0
10 Mar 2023
FoundationTTS: Text-to-Speech for ASR Customization with Generative
  Language Model
FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model
Rui Xue
Yanqing Liu
Lei He
Xuejiao Tan
Linquan Liu
Ed Lin
Sheng Zhao
372
9
0
06 Mar 2023
PITS: Variational Pitch Inference without Fundamental Frequency for
  End-to-End Pitch-controllable TTS
PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS
Junhyeok Lee
Wonbin Jung
Hyunjae Cho
Jaeyeon Kim
Jaehwan Kim
435
5
0
24 Feb 2023
Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a
  Case Study
Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study
Massa Baali
Tomoki Hayashi
Hamdy Mubarak
Soumi Maiti
Shinji Watanabe
W. El-Hajj
Ahmed M. Ali
237
12
0
22 Jan 2023
SNAC: Speaker-normalized affine coupling layer in flow-based
  architecture for zero-shot multi-speaker text-to-speech
SNAC: Speaker-normalized affine coupling layer in flow-based architecture for zero-shot multi-speaker text-to-speechIEEE Signal Processing Letters (SPL), 2022
Byoung Jin Choi
Myeonghun Jeong
Joun Yeop Lee
N. Kim
223
16
0
30 Nov 2022
Deep Fake Detection, Deterrence and Response: Challenges and
  Opportunities
Deep Fake Detection, Deterrence and Response: Challenges and Opportunities
Amin Azmoodeh
Ali Dehghantanha
219
4
0
26 Nov 2022
NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis
NANSY++: Unified Voice Synthesis with Neural Analysis and SynthesisInternational Conference on Learning Representations (ICLR), 2022
Hyeong-Seok Choi
Jinhyeok Yang
Juheon Lee
Hyeongju Kim
363
57
0
17 Nov 2022
DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by
  time-frequency domain supervision from DSP
DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSPIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Kun Song
Yongmao Zhang
Yinjiao Lei
Jian Cong
Hanzhao Li
Linfu Xie
Gang He
Jinfeng Bai
269
26
0
02 Nov 2022
Learning utterance-level representations through token-level acoustic
  latents prediction for Expressive Speech Synthesis
Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis
Karolos Nikitaras
Konstantinos Klapsas
Nikolaos Ellinas
Georgia Maniati
June Sig Sung
Inchul Hwang
S. Raptis
Aimilios Chalamandaris
Pirros Tsiakoulis
262
1
0
01 Nov 2022
Uncertainty-DTW for Time Series and Sequences
Uncertainty-DTW for Time Series and SequencesEuropean Conference on Computer Vision (ECCV), 2022
Lei Wang
Piotr Koniusz
347
47
0
30 Oct 2022
Period VITS: Variational Inference with Explicit Pitch Modeling for
  End-to-end Emotional Speech Synthesis
Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech SynthesisIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Yuma Shirahata
Ryuichi Yamamoto
Eunwoo Song
Ryo Terashima
Jae-Min Kim
Kentaro Tachibana
299
19
0
28 Oct 2022
Improving robustness of spontaneous speech synthesis with linguistic
  speech regularization and pseudo-filled-pause insertion
Improving robustness of spontaneous speech synthesis with linguistic speech regularization and pseudo-filled-pause insertionSpeech Synthesis Workshop (SSW), 2022
Yuta Matsunaga
Takaaki Saeki
Shinnosuke Takamichi
Hiroshi Saruwatari
343
2
0
18 Oct 2022
Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for
  Personalized Spontaneous Speech Synthesis
Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech SynthesisAsia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2022
Yuta Matsunaga
Takaaki Saeki
Shinnosuke Takamichi
Hiroshi Saruwatari
295
3
0
14 Oct 2022
123
Next
Page 1 of 3