ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.11646
  4. Cited By
High Fidelity Speech Synthesis with Adversarial Networks
v1v2 (latest)

High Fidelity Speech Synthesis with Adversarial Networks

International Conference on Learning Representations (ICLR), 2019
25 September 2019
Mikolaj Binkowski
Jeff Donahue
Sander Dieleman
Aidan Clark
Erich Elsen
Norman Casagrande
Luis C. Cobo
Karen Simonyan
ArXiv (abs)PDFHTML

Papers citing "High Fidelity Speech Synthesis with Adversarial Networks"

50 / 153 papers shown
A Survey on Neural Speech Synthesis
A Survey on Neural Speech Synthesis
Xu Tan
Tao Qin
Frank Soong
Tie-Yan Liu
AI4TS
349
435
0
29 Jun 2021
AI based Presentation Creator With Customized Audio Content Delivery
AI based Presentation Creator With Customized Audio Content Delivery
Muvazima Mansoor
Srikanth Chandar
Ramamoorthy Srinath
176
1
0
27 Jun 2021
Glow-WaveGAN: Learning Speech Representations from GAN-based Variational
  Auto-Encoder For High Fidelity Flow-based Speech Synthesis
Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech SynthesisInterspeech (Interspeech), 2021
Jian Cong
Shan Yang
Lei Xie
Jane Polak Scowcroft
DRL
166
29
0
21 Jun 2021
WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis
WaveGrad 2: Iterative Refinement for Text-to-Speech SynthesisInterspeech (Interspeech), 2021
Nanxin Chen
Yu Zhang
Heiga Zen
Ron J. Weiss
Mohammad Norouzi
Najim Dehak
William Chan
DiffM
213
97
0
17 Jun 2021
Non Gaussian Denoising Diffusion Models
Non Gaussian Denoising Diffusion Models
Eliya Nachmani
Robin San Roman
Lior Wolf
VLMDiffM
163
59
0
14 Jun 2021
Catch-A-Waveform: Learning to Generate Audio from a Single Short Example
Catch-A-Waveform: Learning to Generate Audio from a Single Short ExampleNeural Information Processing Systems (NeurIPS), 2021
Gal Greshler
Tamar Rott Shaham
T. Michaeli
197
27
0
11 Jun 2021
Sprachsynthese -- State-of-the-Art in englischer und deutscher Sprache
Sprachsynthese -- State-of-the-Art in englischer und deutscher Sprache
René Peinl
153
0
0
11 Jun 2021
Conditional Variational Autoencoder with Adversarial Learning for
  End-to-End Text-to-Speech
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-SpeechInternational Conference on Machine Learning (ICML), 2021
Jaehyeon Kim
Jungil Kong
Juhee Son
DRL
298
1,151
0
11 Jun 2021
Fre-GAN: Adversarial Frequency-consistent Audio Synthesis
Fre-GAN: Adversarial Frequency-consistent Audio SynthesisInterspeech (Interspeech), 2021
Ji-Hoon Kim
Sang-Hoon Lee
Ji-Hyun Lee
Seong-Whan Lee
198
60
0
04 Jun 2021
NVC-Net: End-to-End Adversarial Voice Conversion
NVC-Net: End-to-End Adversarial Voice ConversionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Bac Nguyen Cong
Fabien Cardinaux
AAML
195
48
0
02 Jun 2021
ItôTTS and ItôWave: Linear Stochastic Differential Equation Is All
  You Need For Audio Generation
ItôTTS and ItôWave: Linear Stochastic Differential Equation Is All You Need For Audio Generation
Shoule Wu
Ziqiang Shi
DiffM
238
11
0
17 May 2021
Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech
Grad-TTS: A Diffusion Probabilistic Model for Text-to-SpeechInternational Conference on Machine Learning (ICML), 2021
Vadim Popov
Ivan Vovk
Vladimir Gogoryan
Tasnima Sadekova
Mikhail Kudinov
DiffM
395
660
0
13 May 2021
VQCPC-GAN: Variable-Length Adversarial Audio Synthesis Using
  Vector-Quantized Contrastive Predictive Coding
VQCPC-GAN: Variable-Length Adversarial Audio Synthesis Using Vector-Quantized Contrastive Predictive CodingIEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2021
J. Nistal
Cyran Aouameur
Stefan Lattner
G. Richard
235
7
0
04 May 2021
VideoGPT: Video Generation using VQ-VAE and Transformers
VideoGPT: Video Generation using VQ-VAE and Transformers
Wilson Yan
Yunzhi Zhang
Pieter Abbeel
A. Srinivas
ViTVGen
632
643
0
20 Apr 2021
Noise Estimation for Generative Diffusion Models
Noise Estimation for Generative Diffusion Models
Robin San-Roman
Eliya Nachmani
Lior Wolf
DiffM
288
117
0
06 Apr 2021
Deepfakes Generation and Detection: State-of-the-art, open challenges,
  countermeasures, and way forward
Deepfakes Generation and Detection: State-of-the-art, open challenges, countermeasures, and way forward
Momina Masood
M. Nawaz
K. Malik
A. Javed
Aun Irtaza
AAML
504
410
0
25 Feb 2021
MaskCycleGAN-VC: Learning Non-parallel Voice Conversion with Filling in
  Frames
MaskCycleGAN-VC: Learning Non-parallel Voice Conversion with Filling in FramesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Nobukatsu Hojo
127
71
0
25 Feb 2021
AudioVisual Speech Synthesis: A brief literature review
AudioVisual Speech Synthesis: A brief literature review
Efthymios Georgiou
Athanasios Katsamanis
77
0
0
18 Feb 2021
High Fidelity Speech Regeneration with Application to Speech Enhancement
High Fidelity Speech Regeneration with Application to Speech EnhancementIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Adam Polyak
Lior Wolf
Yossi Adi
Ori Kabeli
Yaniv Taigman
154
19
0
31 Jan 2021
Fully Non-autoregressive Neural Machine Translation: Tricks of the Trade
Fully Non-autoregressive Neural Machine Translation: Tricks of the TradeFindings (Findings), 2020
Jiatao Gu
X. Kong
246
144
0
31 Dec 2020
MelGlow: Efficient Waveform Generative Network Based on
  Location-Variable Convolution
MelGlow: Efficient Waveform Generative Network Based on Location-Variable ConvolutionSpoken Language Technology Workshop (SLT), 2020
Zhen Zeng
Jianzong Wang
Ning Cheng
Jing Xiao
153
8
0
03 Dec 2020
A Comprehensive Survey on Deep Music Generation: Multi-level
  Representations, Algorithms, Evaluations, and Future Directions
A Comprehensive Survey on Deep Music Generation: Multi-level Representations, Algorithms, Evaluations, and Future Directions
Shulei Ji
Jing Luo
Xinyu Yang
MGen
268
143
0
13 Nov 2020
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Ron J. Weiss
RJ Skerry-Ryan
Eric Battenberg
Soroosh Mariooryad
Diederik P. Kingma
196
106
0
06 Nov 2020
StyleMelGAN: An Efficient High-Fidelity Adversarial Vocoder with
  Temporal Adaptive Normalization
StyleMelGAN: An Efficient High-Fidelity Adversarial Vocoder with Temporal Adaptive Normalization
Ahmed Mustafa
N. Pia
Guillaume Fuchs
180
90
0
03 Nov 2020
Speech Synthesis and Control Using Differentiable DSP
Speech Synthesis and Control Using Differentiable DSP
Giorgio Fabbro
Vladimir Golkov
Thomas Kemp
Zorah Lähner
181
13
0
28 Oct 2020
Upsampling artifacts in neural audio synthesis
Upsampling artifacts in neural audio synthesisIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Jordi Pons
Santiago Pascual
Giulio Cengarle
Joan Serrà
213
69
0
27 Oct 2020
Parallel waveform synthesis based on generative adversarial networks
  with voicing-aware conditional discriminators
Parallel waveform synthesis based on generative adversarial networks with voicing-aware conditional discriminatorsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Ryuichi Yamamoto
Eunwoo Song
Min-Jae Hwang
Jae-Min Kim
175
19
0
27 Oct 2020
CLAR: Contrastive Learning of Auditory Representations
CLAR: Contrastive Learning of Auditory RepresentationsInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2020
Haider Al-Tahan
Y. Mohsenzadeh
SSL
380
67
0
19 Oct 2020
HiFi-GAN: Generative Adversarial Networks for Efficient and High
  Fidelity Speech Synthesis
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
493
2,433
0
12 Oct 2020
The NU Voice Conversion System for the Voice Conversion Challenge 2020:
  On the Effectiveness of Sequence-to-sequence Models and Autoregressive Neural
  Vocoders
The NU Voice Conversion System for the Voice Conversion Challenge 2020: On the Effectiveness of Sequence-to-sequence Models and Autoregressive Neural Vocoders
Wen-Chin Huang
Patrick Lumban Tobing
Yi-Chiao Wu
Kazuhiro Kobayashi
Tomoki Toda
172
9
0
09 Oct 2020
DiffWave: A Versatile Diffusion Model for Audio Synthesis
DiffWave: A Versatile Diffusion Model for Audio SynthesisInternational Conference on Learning Representations (ICLR), 2020
Zhifeng Kong
Ming-Yu Liu
Jiaji Huang
Kexin Zhao
Bryan Catanzaro
DiffMBDL
672
1,759
0
21 Sep 2020
HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis
HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis
Jiawei Chen
Xu Tan
Jian Luan
Tao Qin
Tie-Yan Liu
VLM
218
106
0
03 Sep 2020
WaveGrad: Estimating Gradients for Waveform Generation
WaveGrad: Estimating Gradients for Waveform GenerationInternational Conference on Learning Representations (ICLR), 2020
Nanxin Chen
Yu Zhang
Heiga Zen
Ron J. Weiss
Mohammad Norouzi
William Chan
DiffMBDL
410
886
0
02 Sep 2020
Prosody Learning Mechanism for Speech Synthesis System Without Text
  Length Limit
Prosody Learning Mechanism for Speech Synthesis System Without Text Length LimitInterspeech (Interspeech), 2020
Zhen Zeng
Jianzong Wang
Ning Cheng
Jing Xiao
138
9
0
13 Aug 2020
An Overview of Voice Conversion and its Challenges: From Statistical
  Modeling to Deep Learning
An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep LearningIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2020
Berrak Sisman
Junichi Yamagishi
Simon King
Haizhou Li
BDL
440
389
0
09 Aug 2020
A Spectral Energy Distance for Parallel Speech Synthesis
A Spectral Energy Distance for Parallel Speech Synthesis
A. Gritsenko
Tim Salimans
Rianne van den Berg
Jasper Snoek
Nal Kalchbrenner
236
78
0
03 Aug 2020
Adversarially Trained Multi-Singer Sequence-To-Sequence Singing
  Synthesizer
Adversarially Trained Multi-Singer Sequence-To-Sequence Singing Synthesizer
Jie Wu
Jian Luan
158
28
0
18 Jun 2020
FastPitch: Parallel Text-to-speech with Pitch Prediction
FastPitch: Parallel Text-to-speech with Pitch PredictionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Adrian Lañcucki
265
388
0
11 Jun 2020
HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech
  Deep Features in Adversarial Networks
HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial NetworksInterspeech (Interspeech), 2020
Jiaqi Su
Zeyu Jin
Adam Finkelstein
171
155
0
10 Jun 2020
End-to-End Adversarial Text-to-Speech
End-to-End Adversarial Text-to-Speech
Jeff Donahue
Sander Dieleman
Mikolaj Binkowski
Erich Elsen
Karen Simonyan
317
192
0
05 Jun 2020
Speech-to-Singing Conversion based on Boundary Equilibrium GAN
Speech-to-Singing Conversion based on Boundary Equilibrium GANInterspeech (Interspeech), 2020
Da-Yi Wu
Yi-Hsuan Yang
GAN
205
8
0
28 May 2020
Quasi-Periodic Parallel WaveGAN Vocoder: A Non-autoregressive
  Pitch-dependent Dilated Convolution Model for Parametric Speech Generation
Quasi-Periodic Parallel WaveGAN Vocoder: A Non-autoregressive Pitch-dependent Dilated Convolution Model for Parametric Speech Generation
Yi-Chiao Wu
Tomoki Hayashi
T. Okamoto
Hisashi Kawai
Tomoki Toda
152
4
0
18 May 2020
Flowtron: an Autoregressive Flow-based Generative Network for
  Text-to-Speech Synthesis
Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis
Rafael Valle
Kevin J. Shih
R. Prenger
Bryan Catanzaro
271
131
0
12 May 2020
Multi-band MelGAN: Faster Waveform Generation for High-Quality
  Text-to-Speech
Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech
Geng Yang
Shan Yang
Kai-Chun Liu
Peng Fang
Wei Chen
Lei Xie
225
225
0
11 May 2020
GACELA -- A generative adversarial context encoder for long audio
  inpainting
GACELA -- A generative adversarial context encoder for long audio inpainting
Andrés Marafioti
P. Majdak
Nicki Holighaus
Nathanael Perraudin
266
51
0
11 May 2020
Cotatron: Transcription-Guided Speech Encoder for Any-to-Many Voice
  Conversion without Parallel Data
Cotatron: Transcription-Guided Speech Encoder for Any-to-Many Voice Conversion without Parallel Data
Seung-won Park
Doo-young Kim
Myun-chul Joe
155
45
0
07 May 2020
Conditional Spoken Digit Generation with StyleGAN
Conditional Spoken Digit Generation with StyleGANInterspeech (Interspeech), 2020
Kasperi Palkama
Lauri Juvela
Alexander Ilin
GAN
224
11
0
28 Apr 2020
Transformation-based Adversarial Video Prediction on Large-Scale Data
Transformation-based Adversarial Video Prediction on Large-Scale Data
Pauline Luc
Aidan Clark
Sander Dieleman
Diego de Las Casas
Yotam Doron
Albin Cassirer
Karen Simonyan
VGen
1.0K
91
0
09 Mar 2020
A Limited-Capacity Minimax Theorem for Non-Convex Games or: How I
  Learned to Stop Worrying about Mixed-Nash and Love Neural Nets
A Limited-Capacity Minimax Theorem for Non-Convex Games or: How I Learned to Stop Worrying about Mixed-Nash and Love Neural Nets
Gauthier Gidel
David Balduzzi
Wojciech M. Czarnecki
M. Garnelo
Yoram Bachrach
255
7
0
14 Feb 2020
Score and Lyrics-Free Singing Voice Generation
Score and Lyrics-Free Singing Voice GenerationInternational Conference on Innovative Computing and Cloud Computing (ICCC), 2019
Jen-Yu Liu
Yu-Hua Chen
Yin-Cheng Yeh
Yi-Hsuan Yang
170
23
0
26 Dec 2019
Previous
1234
Next