Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2006.03575
Cited By
End-to-End Adversarial Text-to-Speech
5 June 2020
Jeff Donahue
Sander Dieleman
Mikolaj Binkowski
Erich Elsen
Karen Simonyan
Re-assign community
ArXiv
PDF
HTML
Papers citing
"End-to-End Adversarial Text-to-Speech"
45 / 45 papers shown
Title
Muyan-TTS: A Trainable Text-to-Speech Model Optimized for Podcast Scenarios with a
50
K
B
u
d
g
e
t
50K Budget
50
K
B
u
d
g
e
t
Xin Li
Kaikai Jia
Hao Sun
Jun Dai
Z. L. Jiang
126
0
0
27 Apr 2025
P2Mark: Plug-and-play Parameter-level Watermarking for Neural Speech Generation
Yong Ren
Jiangyan Yi
Tao Wang
J. Tao
Zhengqi Wen
Chenxing Li
Zheng Lian
Ruibo Fu
Ye Bai
Xiaohui Zhang
58
0
0
07 Apr 2025
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Xueyao Zhang
Liumeng Xue
Yicheng Gu
Yuancheng Wang
Haorui He
...
Mingxuan Wang
Jun Han
Kai Chen
Haizhou Li
Zhizheng Wu
29
26
0
15 Dec 2023
DPP-TTS: Diversifying prosodic features of speech via determinantal point processes
Seongho Joo
Hyukhun Koh
Kyomin Jung
DiffM
41
0
0
23 Oct 2023
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
Jungil Kong
Jihoon Park
Beomjeong Kim
Jeongmin Kim
Dohee Kong
Sangjin Kim
34
35
0
31 Jul 2023
ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
Huadai Liu
Rongjie Huang
Xuan Lin
Wenqiang Xu
Maozong Zheng
Hong Chen
Jinzheng He
Zhou Zhao
DiffM
36
20
0
22 May 2023
FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model
Rui Xue
Yanqing Liu
Lei He
Xuejiao Tan
Linquan Liu
Ed Lin
Sheng Zhao
28
7
0
06 Mar 2023
PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS
Junhyeok Lee
Wonbin Jung
Hyunjae Cho
Jaeyeon Kim
Jaehwan Kim
17
3
0
24 Feb 2023
Deep Fake Detection, Deterrence and Response: Challenges and Opportunities
Amin Azmoodeh
Ali Dehghantanha
39
2
0
26 Nov 2022
NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis
Hyeong-Seok Choi
Jinhyeok Yang
Juheon Lee
Hyeongju Kim
20
46
0
17 Nov 2022
Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis
Karolos Nikitaras
Konstantinos Klapsas
Nikolaos Ellinas
Georgia Maniati
June Sig Sung
Inchul Hwang
S. Raptis
Aimilios Chalamandaris
Pirros Tsiakoulis
14
0
0
01 Nov 2022
Uncertainty-DTW for Time Series and Sequences
Lei Wang
Piotr Koniusz
11
33
0
30 Oct 2022
Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech
Byoung Jin Choi
Myeonghun Jeong
Minchan Kim
Sung Hwan Mun
N. Kim
DiffM
27
5
0
12 Oct 2022
Multi-Task Adversarial Training Algorithm for Multi-Speaker Neural Text-to-Speech
Yusuke Nakai
Yuki Saito
K. Udagawa
Hiroshi Saruwatari
AAML
17
1
0
26 Sep 2022
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech
Rongjie Huang
Zhou Zhao
Huadai Liu
Jinglin Liu
Chenye Cui
Yi Ren
DiffM
44
193
0
13 Jul 2022
Deep Learning Enabled Semantic Communications with Speech Recognition and Synthesis
Zhenzi Weng
Zhijin Qin
Xiaoming Tao
Chengkang Pan
Guangyi Liu
Geoffrey Ye Li
33
132
0
09 May 2022
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
Xu Tan
Jiawei Chen
Haohe Liu
Jian Cong
Chen Zhang
...
Lei He
Frank Soong
Tao Qin
Sheng Zhao
Tie-Yan Liu
44
211
0
09 May 2022
FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis
Rongjie Huang
Max W. Y. Lam
Jun Wang
Dan Su
Dong Yu
Yi Ren
Zhou Zhao
DiffM
28
165
0
21 Apr 2022
Fine-grained Noise Control for Multispeaker Speech Synthesis
Karolos Nikitaras
G. Vamvoukakis
Nikolaos Ellinas
Konstantinos Klapsas
K. Markopoulos
S. Raptis
June Sig Sung
Gunu Jho
Aimilios Chalamandaris
Pirros Tsiakoulis
24
4
0
11 Apr 2022
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech
D. Lim
Sunghee Jung
Eesung Kim
17
51
0
31 Mar 2022
End-to-end LPCNet: A Neural Vocoder With Fully-Differentiable LPC Estimation
Krishna Subramani
J. Valin
Umut Isik
Paris Smaragdis
A. Krishnaswamy
21
11
0
23 Feb 2022
DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs
Songxiang Liu
Dan Su
Dong Yu
DiffM
68
65
0
28 Jan 2022
Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis
Yu Wang
Xinsheng Wang
Pengcheng Zhu
Jie Wu
Hanzhao Li
Heyang Xue
Yongmao Zhang
Lei Xie
Mengxiao Bi
25
95
0
19 Jan 2022
BLT: Bidirectional Layout Transformer for Controllable Layout Generation
X. Kong
Lu Jiang
Huiwen Chang
Han Zhang
Yuan Hao
Haifeng Gong
Irfan Essa
41
79
0
09 Dec 2021
WaveFake: A Data Set to Facilitate Audio Deepfake Detection
Joel Frank
Lea Schonherr
DiffM
129
123
0
04 Nov 2021
Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models
Jen-Hao Rick Chang
A. Shrivastava
H. Koppula
Xiaoshuai Zhang
Oncel Tuzel
DiffM
51
16
0
06 Oct 2021
Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network
Takaaki Saeki
Shinnosuke Takamichi
Hiroshi Saruwatari
26
3
0
22 Sep 2021
AligNART: Non-autoregressive Neural Machine Translation by Jointly Learning to Estimate Alignment and Translate
Jongyoon Song
Sungwon Kim
Sungroh Yoon
66
37
0
14 Sep 2021
Neural HMMs are all you need (for high-quality attention-free TTS)
Shivam Mehta
Éva Székely
Jonas Beskow
G. Henter
30
18
0
30 Aug 2021
GC-TTS: Few-shot Speaker Adaptation with Geometric Constraints
Ji-Hoon Kim
Sang-Hoon Lee
Ji-Hyun Lee
Hong G Jung
Seong-Whan Lee
39
6
0
16 Aug 2021
A Survey on Neural Speech Synthesis
Xu Tan
Tao Qin
Frank Soong
Tie-Yan Liu
AI4TS
18
352
0
29 Jun 2021
WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis
Nanxin Chen
Yu Zhang
Heiga Zen
Ron J. Weiss
Mohammad Norouzi
Najim Dehak
William Chan
DiffM
21
88
0
17 Jun 2021
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Jaehyeon Kim
Jungil Kong
Juhee Son
DRL
51
840
0
11 Jun 2021
Review of end-to-end speech synthesis technology based on deep learning
Zhaoxi Mu
Xinyu Yang
Yizhuo Dong
AuLLM
ALM
26
24
0
20 Apr 2021
Attention Forcing for Machine Translation
Qingyun Dou
Yiting Lu
Potsawee Manakul
Xixin Wu
Mark J. F. Gales
23
7
0
02 Apr 2021
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention
Peng Liu
Yuewen Cao
Songxiang Liu
Na Hu
Guangzhi Li
Chao Weng
Dan Su
34
22
0
12 Feb 2021
Fully Non-autoregressive Neural Machine Translation: Tricks of the Trade
Jiatao Gu
X. Kong
23
135
0
31 Dec 2020
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Ron J. Weiss
RJ Skerry-Ryan
Eric Battenberg
Soroosh Mariooryad
Diederik P. Kingma
19
97
0
06 Nov 2020
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Isaac Elias
Heiga Zen
Jonathan Shen
Yu Zhang
Ye Jia
Ron J. Weiss
Yonghui Wu
DRL
17
102
0
22 Oct 2020
DiffWave: A Versatile Diffusion Model for Audio Synthesis
Zhifeng Kong
Ming-Yu Liu
Jiaji Huang
Kexin Zhao
Bryan Catanzaro
DiffM
BDL
34
1,389
0
21 Sep 2020
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
Yi Ren
Chenxu Hu
Xu Tan
Tao Qin
Sheng Zhao
Zhou Zhao
Tie-Yan Liu
54
1,355
0
08 Jun 2020
DDSP: Differentiable Digital Signal Processing
Jesse Engel
Lamtharn Hantrakul
Chenjie Gu
Adam Roberts
DiffM
94
373
0
14 Jan 2020
High Fidelity Speech Synthesis with Adversarial Networks
Mikolaj Binkowski
Jeff Donahue
Sander Dieleman
Aidan Clark
Erich Elsen
Norman Casagrande
Luis C. Cobo
Karen Simonyan
223
239
0
25 Sep 2019
Soft-DTW: a Differentiable Loss Function for Time-Series
Marco Cuturi
Mathieu Blondel
AI4TS
141
611
0
05 Mar 2017
A Learned Representation For Artistic Style
Vincent Dumoulin
Jonathon Shlens
M. Kudlur
GAN
214
1,156
0
24 Oct 2016
1