ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.11646
  4. Cited By
High Fidelity Speech Synthesis with Adversarial Networks
v1v2 (latest)

High Fidelity Speech Synthesis with Adversarial Networks

International Conference on Learning Representations (ICLR), 2019
25 September 2019
Mikolaj Binkowski
Jeff Donahue
Sander Dieleman
Aidan Clark
Erich Elsen
Norman Casagrande
Luis C. Cobo
Karen Simonyan
ArXiv (abs)PDFHTML

Papers citing "High Fidelity Speech Synthesis with Adversarial Networks"

50 / 153 papers shown
Title
MARS: Audio Generation via Multi-Channel Autoregression on Spectrograms
MARS: Audio Generation via Multi-Channel Autoregression on Spectrograms
Eleonora Ristori
Luca Bindini
Paolo Frasconi
104
0
0
30 Sep 2025
Scaling to Multimodal and Multichannel Heart Sound Classification with Synthetic and Augmented Biosignals
Scaling to Multimodal and Multichannel Heart Sound Classification with Synthetic and Augmented Biosignals
Milan Marocchi
Matthew Fynn
Kayapanda Mandana
Yue Rong
151
0
0
15 Sep 2025
MEAN-RIR: Multi-Modal Environment-Aware Network for Robust Room Impulse Response Estimation
MEAN-RIR: Multi-Modal Environment-Aware Network for Robust Room Impulse Response Estimation
Jiajian Chen
Jiakang Chen
Hang Chen
Qing Wang
Yu Gao
Jun Du
76
1
0
05 Sep 2025
Layer-wise Analysis for Quality of Multilingual Synthesized Speech
Layer-wise Analysis for Quality of Multilingual Synthesized Speech
Erica Cooper
T. Okamoto
Yamato Ohtani
Tomoki Toda
Hisashi Kawai
116
0
0
05 Sep 2025
WaveLLDM: Design and Development of a Lightweight Latent Diffusion Model for Speech Enhancement and Restoration
WaveLLDM: Design and Development of a Lightweight Latent Diffusion Model for Speech Enhancement and Restoration
Kevin Putra Santoso
Rizka Wakhidatus Sholikah
Raden Venantius Hari Ginardi
147
0
0
28 Aug 2025
DARAS: Dynamic Audio-Room Acoustic Synthesis for Blind Room Impulse Response Estimation
DARAS: Dynamic Audio-Room Acoustic Synthesis for Blind Room Impulse Response Estimation
Chunxi Wang
Maoshen Jia
Wenyu Jin
112
0
0
10 Jul 2025
DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis
DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio SynthesisIEEE Access (IEEE Access), 2025
Zeeshan Ahmad
Shudi Bao
Meng Chen
215
1
0
14 May 2025
Hierarchical Conditional Tabular GAN for Multi-Tabular Synthetic Data
  Generation
Hierarchical Conditional Tabular GAN for Multi-Tabular Synthetic Data Generation
Wilhelm Ågren
Victorio Úbeda Sosa
235
2
0
11 Nov 2024
Generative Deep Learning and Signal Processing for Data Augmentation of Cardiac Auscultation Signals: Improving Model Robustness Using Synthetic Audio
Generative Deep Learning and Signal Processing for Data Augmentation of Cardiac Auscultation Signals: Improving Model Robustness Using Synthetic AudioBiomedical Signal Processing and Control (BSPC), 2024
Leigh Abbott
Milan Marocchi
Matthew Fynn
Yue Rong
Sven Nordholm
MedIm
203
2
0
14 Oct 2024
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform
  Generation
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform GenerationInternational Conference on Learning Representations (ICLR), 2024
Sang-Hoon Lee
Ha-Yeong Choi
Seong-Whan Lee
OODDiffMAI4TS
301
13
0
14 Aug 2024
Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery
  Detection and Localization
Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery Detection and Localization
Junyan Wu
Wei Lu
Xiangyang Luo
Rui Yang
Qian Wang
Xiaochun Cao
237
11
0
23 Jul 2024
A Survey of Deep Learning Audio Generation Methods
A Survey of Deep Learning Audio Generation Methods
Matej Bozic
Marko Horvat
VLMMedIm
299
9
0
31 May 2024
Sok: Comprehensive Security Overview, Challenges, and Future Directions
  of Voice-Controlled Systems
Sok: Comprehensive Security Overview, Challenges, and Future Directions of Voice-Controlled Systems
Haozhe Xu
Cong Wu
Yangyang Gu
Xingcan Shang
Jing Chen
Kun He
Ruiying Du
260
4
0
27 May 2024
Speaking in Wavelet Domain: A Simple and Efficient Approach to Speed up
  Speech Diffusion Model
Speaking in Wavelet Domain: A Simple and Efficient Approach to Speed up Speech Diffusion Model
Xiangyu Zhang
Daijiao Liu
Hexin Liu
Qiquan Zhang
Hanyu Meng
Leibny Paola García
Chng Eng Siong
Lina Yao
DiffM
214
13
0
16 Feb 2024
Brain-Conditional Multimodal Synthesis: A Survey and Taxonomy
Brain-Conditional Multimodal Synthesis: A Survey and TaxonomyIEEE Transactions on Artificial Intelligence (IEEE TAI), 2023
Weijian Mai
Jian Zhang
Pengfei Fang
Zhijun Zhang
457
14
0
31 Dec 2023
The Effects of Signal-to-Noise Ratio on Generative Adversarial Networks
  Applied to Marine Bioacoustic Data
The Effects of Signal-to-Noise Ratio on Generative Adversarial Networks Applied to Marine Bioacoustic Data
Georgia Atkinson
Nick Wright
A. Mcgough
Per Berggren
GAN
182
0
0
22 Dec 2023
A Representative Study on Human Detection of Artificially Generated
  Media Across Countries
A Representative Study on Human Detection of Artificially Generated Media Across Countries
Joel Frank
Franziska Herbert
Jonas Ricker
Lea Schonherr
Thorsten Eisenhofer
Asja Fischer
Markus Dürmuth
Thorsten Holz
249
28
0
10 Dec 2023
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor
  Cores
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor CoresInternational Conference on Learning Representations (ICLR), 2023
Daniel Y. Fu
Hermann Kumbong
Eric N. D. Nguyen
Christopher Ré
VLM
252
38
0
10 Nov 2023
Enabling Acoustic Audience Feedback in Large Virtual Events
Enabling Acoustic Audience Feedback in Large Virtual Events
Tamay Aykut
M. Hofbauer
Christopher B. Kuhn
Eckehard Steinbach
Bernd Girod
166
0
0
27 Oct 2023
Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech
Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech
Dareen Alharthi
Roshan S. Sharma
Hira Dhamyal
Soumi Maiti
Bhiksha Raj
Rita Singh
142
7
0
01 Oct 2023
VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching
VoiceFlow: Efficient Text-to-Speech with Rectified Flow MatchingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Yiwei Guo
Chenpeng Du
Ziyang Ma
Xie Chen
K. Yu
DiffM
264
61
0
10 Sep 2023
AI-Generated Content (AIGC) for Various Data Modalities: A Survey
AI-Generated Content (AIGC) for Various Data Modalities: A SurveyACM Computing Surveys (ACM Comput. Surv.), 2023
Lin Geng Foo
Hossein Rahmani
Jing Liu
708
45
0
27 Aug 2023
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech
  with Adversarial Learning and Architecture Design
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture DesignInterspeech (Interspeech), 2023
Jungil Kong
Jihoon Park
Beomjeong Kim
Jeongmin Kim
Dohee Kong
Sangjin Kim
205
66
0
31 Jul 2023
The Ethical Implications of Generative Audio Models: A Systematic
  Literature Review
The Ethical Implications of Generative Audio Models: A Systematic Literature ReviewAAAI/ACM Conference on AI, Ethics, and Society (AIES), 2023
J. Barnett
251
47
0
07 Jul 2023
LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading
LipVoicer: Generating Speech from Silent Videos Guided by Lip ReadingInternational Conference on Learning Representations (ICLR), 2023
Yochai Yemini
Aviv Shamsian
Lior Bracha
Sharon Gannot
Ethan Fetaya
DiffM
326
22
0
05 Jun 2023
UnDiff: Unsupervised Voice Restoration with Unconditional Diffusion
  Model
UnDiff: Unsupervised Voice Restoration with Unconditional Diffusion ModelInterspeech (Interspeech), 2023
A. Iashchenko
Pavel Andreev
Ivan Shchekotov
Nicholas Babaev
Dmitry Vetrov
DiffM
324
7
0
01 Jun 2023
U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech
U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech
Xin Jing
Yi Chang
Zijiang Yang
Jiang-jian Xie
Andreas Triantafyllopoulos
Bjoern W. Schuller
206
11
0
22 May 2023
APNet: An All-Frame-Level Neural Vocoder Incorporating Direct Prediction
  of Amplitude and Phase Spectra
APNet: An All-Frame-Level Neural Vocoder Incorporating Direct Prediction of Amplitude and Phase SpectraIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Yang Ai
Zhenhua Ling
170
25
0
13 May 2023
Learn to Sing by Listening: Building Controllable Virtual Singer by
  Unsupervised Learning from Voice Recordings
Learn to Sing by Listening: Building Controllable Virtual Singer by Unsupervised Learning from Voice Recordings
Wei Xue
Yiwen Wang
Qi-fei Liu
Yi-Ting Guo
175
1
0
09 May 2023
Source-Filter-Based Generative Adversarial Neural Vocoder for High
  Fidelity Speech Synthesis
Source-Filter-Based Generative Adversarial Neural Vocoder for High Fidelity Speech Synthesis
Ye-Xin Lu
Yang Ai
Zhenhua Ling
175
1
0
26 Apr 2023
ArmanTTS single-speaker Persian dataset
ArmanTTS single-speaker Persian dataset
Mohammd Hasan Shamgholi
Vahid Saeedi
J. Peymanfard
Leila Alhabib
Hossein Zeinali
93
3
0
07 Apr 2023
A Survey on Audio Diffusion Models: Text To Speech Synthesis and
  Enhancement in Generative AI
A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI
Chenshuang Zhang
Chaoning Zhang
Sheng Zheng
Mengchun Zhang
Maryam Qamar
Sung-Ho Bae
In So Kweon
DiffMMedIm
252
106
0
23 Mar 2023
Speech Modeling with a Hierarchical Transformer Dynamical VAE
Speech Modeling with a Hierarchical Transformer Dynamical VAEIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Xiaoyu Lin
Xiaoyu Bie
Simon Leglaive
Laurent Girin
Xavier Alameda-Pineda
BDL
178
3
0
07 Mar 2023
Contrast-PLC: Contrastive Learning for Packet Loss Concealment
Contrast-PLC: Contrastive Learning for Packet Loss ConcealmentIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Huaying Xue
Xiulian Peng
Yan Lu
173
7
0
26 Feb 2023
Separate And Diffuse: Using a Pretrained Diffusion Model for Improving
  Source Separation
Separate And Diffuse: Using a Pretrained Diffusion Model for Improving Source Separation
Shahar Lutati
Eliya Nachmani
Lior Wolf
DiffM
205
19
0
25 Jan 2023
MooseNet: A Trainable Metric for Synthesized Speech with a PLDA Module
MooseNet: A Trainable Metric for Synthesized Speech with a PLDA ModuleSpeech Synthesis Workshop (SSW), 2023
Ondvrej Plátek
Ondrej Dusek
176
2
0
17 Jan 2023
ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to
  Speech
ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech
Ze Chen
Yihan Wu
Yichong Leng
Jiawei Chen
Haohe Liu
...
Ke Wang
Lei He
Sheng Zhao
Jiang Bian
Danilo Mandic
DiffM
204
26
0
30 Dec 2022
Semantics-Empowered Communication: A Tutorial-cum-Survey
Semantics-Empowered Communication: A Tutorial-cum-Survey
Zhilin Lu
Rongpeng Li
Kun Lu
Xianfu Chen
Ekram Hossain
Zhifeng Zhao
Honggang Zhang
503
23
0
16 Dec 2022
BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric
BLASER: A Text-Free Speech-to-Speech Translation Evaluation MetricAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Mingda Chen
Paul-Ambroise Duquenne
Pierre Yves Andrews
Justine T. Kao
Alexandre Mourachko
Holger Schwenk
Marta R. Costa-jussá
246
23
0
16 Dec 2022
Evaluating and reducing the distance between synthetic and real speech
  distributions
Evaluating and reducing the distance between synthetic and real speech distributionsInterspeech (Interspeech), 2022
Christoph Minixhofer
Ondˇrej Klejch
P. Bell
217
9
0
29 Nov 2022
Deep Fake Detection, Deterrence and Response: Challenges and
  Opportunities
Deep Fake Detection, Deterrence and Response: Challenges and Opportunities
Amin Azmoodeh
Ali Dehghantanha
173
3
0
26 Nov 2022
Towards Building Text-To-Speech Systems for the Next Billion Users
Towards Building Text-To-Speech Systems for the Next Billion UsersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Gokul Karthik Kumar
V. PraveenS.
Pratyush Kumar
Mitesh M. Khapra
Karthik Nandakumar
271
28
0
17 Nov 2022
NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis
NANSY++: Unified Voice Synthesis with Neural Analysis and SynthesisInternational Conference on Learning Representations (ICLR), 2022
Hyeong-Seok Choi
Jinhyeok Yang
Juheon Lee
Hyeongju Kim
228
54
0
17 Nov 2022
Efficiently Trained Low-Resource Mongolian Text-to-Speech System Based
  On FullConv-TTS
Efficiently Trained Low-Resource Mongolian Text-to-Speech System Based On FullConv-TTS
Ziqi Liang
178
0
0
24 Oct 2022
Adversarial Permutation Invariant Training for Universal Sound
  Separation
Adversarial Permutation Invariant Training for Universal Sound SeparationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Emilian Postolache
Jordi Pons
Santiago Pascual
Joan Serrà
VLM
269
10
0
21 Oct 2022
Improving robustness of spontaneous speech synthesis with linguistic
  speech regularization and pseudo-filled-pause insertion
Improving robustness of spontaneous speech synthesis with linguistic speech regularization and pseudo-filled-pause insertionSpeech Synthesis Workshop (SSW), 2022
Yuta Matsunaga
Takaaki Saeki
Shinnosuke Takamichi
Hiroshi Saruwatari
212
2
0
18 Oct 2022
Images as Weight Matrices: Sequential Image Generation Through Synaptic
  Learning Rules
Images as Weight Matrices: Sequential Image Generation Through Synaptic Learning RulesInternational Conference on Learning Representations (ICLR), 2022
Kazuki Irie
Jürgen Schmidhuber
229
9
0
07 Oct 2022
Mandarin Singing Voice Synthesis with Denoising Diffusion Probabilistic
  Wasserstein GAN
Mandarin Singing Voice Synthesis with Denoising Diffusion Probabilistic Wasserstein GANAsia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2022
Yin-Ping Cho
Yu Tsao
Hsin-Min Wang
Yi-Wen Liu
DiffM
221
9
0
21 Sep 2022
Lightweight Long-Range Generative Adversarial Networks
Lightweight Long-Range Generative Adversarial Networks
Bowen Li
Thomas Lukasiewicz
GAN
178
4
0
08 Sep 2022
AudioLM: a Language Modeling Approach to Audio Generation
AudioLM: a Language Modeling Approach to Audio GenerationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Zalan Borsos
Raphaël Marinier
Damien Vincent
Eugene Kharitonov
Olivier Pietquin
...
Dominik Roblek
O. Teboul
David Grangier
Marco Tagliasacchi
Neil Zeghidour
AuLLM
392
813
0
07 Sep 2022
1234
Next