v1v2 (latest)

High Fidelity Speech Synthesis with Adversarial Networks

International Conference on Learning Representations (ICLR), 2019

25 September 2019

Papers citing "High Fidelity Speech Synthesis with Adversarial Networks"

50 / 153 papers shown

Title
MARS: Audio Generation via Multi-Channel Autoregression on Spectrograms Eleonora Ristori Luca Bindini Paolo Frasconi 104 0 0 30 Sep 2025
Scaling to Multimodal and Multichannel Heart Sound Classification with Synthetic and Augmented Biosignals Milan Marocchi Matthew Fynn Kayapanda Mandana Yue Rong 151 0 0 15 Sep 2025
MEAN-RIR: Multi-Modal Environment-Aware Network for Robust Room Impulse Response Estimation Jiajian Chen Jiakang Chen Hang Chen Qing Wang Yu Gao Jun Du 76 1 0 05 Sep 2025
Layer-wise Analysis for Quality of Multilingual Synthesized Speech Erica Cooper T. Okamoto Yamato Ohtani Tomoki Toda Hisashi Kawai 116 0 0 05 Sep 2025
WaveLLDM: Design and Development of a Lightweight Latent Diffusion Model for Speech Enhancement and Restoration Kevin Putra Santoso Rizka Wakhidatus Sholikah Raden Venantius Hari Ginardi 147 0 0 28 Aug 2025
DARAS: Dynamic Audio-Room Acoustic Synthesis for Blind Room Impulse Response Estimation Chunxi Wang Maoshen Jia Wenyu Jin 112 0 0 10 Jul 2025
DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio SynthesisIEEE Access (IEEE Access), 2025 Zeeshan Ahmad Shudi Bao Meng Chen 215 1 0 14 May 2025
Hierarchical Conditional Tabular GAN for Multi-Tabular Synthetic Data Generation Wilhelm Ågren Victorio Úbeda Sosa 235 2 0 11 Nov 2024
Generative Deep Learning and Signal Processing for Data Augmentation of Cardiac Auscultation Signals: Improving Model Robustness Using Synthetic AudioBiomedical Signal Processing and Control (BSPC), 2024 Leigh Abbott Milan Marocchi Matthew Fynn Yue Rong Sven Nordholm MedIm 203 2 0 14 Oct 2024
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform GenerationInternational Conference on Learning Representations (ICLR), 2024 Sang-Hoon Lee Ha-Yeong Choi Seong-Whan Lee OOD DiffM AI4TS 301 13 0 14 Aug 2024
Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery Detection and Localization Junyan Wu Wei Lu Xiangyang Luo Rui Yang Qian Wang Xiaochun Cao 237 11 0 23 Jul 2024
A Survey of Deep Learning Audio Generation Methods Matej Bozic Marko Horvat VLM MedIm 299 9 0 31 May 2024
Sok: Comprehensive Security Overview, Challenges, and Future Directions of Voice-Controlled Systems Haozhe Xu Cong Wu Yangyang Gu Xingcan Shang Jing Chen Kun He Ruiying Du 260 4 0 27 May 2024
Speaking in Wavelet Domain: A Simple and Efficient Approach to Speed up Speech Diffusion Model Xiangyu Zhang Daijiao Liu Hexin Liu Qiquan Zhang Hanyu Meng Leibny Paola García Chng Eng Siong Lina Yao DiffM 214 13 0 16 Feb 2024
Brain-Conditional Multimodal Synthesis: A Survey and TaxonomyIEEE Transactions on Artificial Intelligence (IEEE TAI), 2023 Weijian Mai Jian Zhang Pengfei Fang Zhijun Zhang 457 14 0 31 Dec 2023
The Effects of Signal-to-Noise Ratio on Generative Adversarial Networks Applied to Marine Bioacoustic Data Georgia Atkinson Nick Wright A. Mcgough Per Berggren GAN 182 0 0 22 Dec 2023
A Representative Study on Human Detection of Artificially Generated Media Across Countries Joel Frank Franziska Herbert Jonas Ricker Lea Schonherr Thorsten Eisenhofer Asja Fischer Markus Dürmuth Thorsten Holz 249 28 0 10 Dec 2023
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor CoresInternational Conference on Learning Representations (ICLR), 2023 Daniel Y. Fu Hermann Kumbong Eric N. D. Nguyen Christopher Ré VLM 252 38 0 10 Nov 2023
Enabling Acoustic Audience Feedback in Large Virtual Events Tamay Aykut M. Hofbauer Christopher B. Kuhn Eckehard Steinbach Bernd Girod 166 0 0 27 Oct 2023
Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech Dareen Alharthi Roshan S. Sharma Hira Dhamyal Soumi Maiti Bhiksha Raj Rita Singh 142 7 0 01 Oct 2023
VoiceFlow: Efficient Text-to-Speech with Rectified Flow MatchingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023 Yiwei Guo Chenpeng Du Ziyang Ma Xie Chen K. Yu DiffM 264 61 0 10 Sep 2023
AI-Generated Content (AIGC) for Various Data Modalities: A SurveyACM Computing Surveys (ACM Comput. Surv.), 2023 Lin Geng Foo Hossein Rahmani Jing Liu 708 45 0 27 Aug 2023
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture DesignInterspeech (Interspeech), 2023 Jungil Kong Jihoon Park Beomjeong Kim Jeongmin Kim Dohee Kong Sangjin Kim 205 66 0 31 Jul 2023
The Ethical Implications of Generative Audio Models: A Systematic Literature ReviewAAAI/ACM Conference on AI, Ethics, and Society (AIES), 2023 J. Barnett 251 47 0 07 Jul 2023
LipVoicer: Generating Speech from Silent Videos Guided by Lip ReadingInternational Conference on Learning Representations (ICLR), 2023 Yochai Yemini Aviv Shamsian Lior Bracha Sharon Gannot Ethan Fetaya DiffM 326 22 0 05 Jun 2023
UnDiff: Unsupervised Voice Restoration with Unconditional Diffusion ModelInterspeech (Interspeech), 2023 A. Iashchenko Pavel Andreev Ivan Shchekotov Nicholas Babaev Dmitry Vetrov DiffM 324 7 0 01 Jun 2023
U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech Xin Jing Yi Chang Zijiang Yang Jiang-jian Xie Andreas Triantafyllopoulos Bjoern W. Schuller 206 11 0 22 May 2023
APNet: An All-Frame-Level Neural Vocoder Incorporating Direct Prediction of Amplitude and Phase SpectraIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023 Yang Ai Zhenhua Ling 170 25 0 13 May 2023
Learn to Sing by Listening: Building Controllable Virtual Singer by Unsupervised Learning from Voice Recordings Wei Xue Yiwen Wang Qi-fei Liu Yi-Ting Guo 175 1 0 09 May 2023
Source-Filter-Based Generative Adversarial Neural Vocoder for High Fidelity Speech Synthesis Ye-Xin Lu Yang Ai Zhenhua Ling 175 1 0 26 Apr 2023
ArmanTTS single-speaker Persian dataset Mohammd Hasan Shamgholi Vahid Saeedi J. Peymanfard Leila Alhabib Hossein Zeinali 93 3 0 07 Apr 2023
A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI Chenshuang Zhang Chaoning Zhang Sheng Zheng Mengchun Zhang Maryam Qamar Sung-Ho Bae In So Kweon DiffM MedIm 252 106 0 23 Mar 2023
Speech Modeling with a Hierarchical Transformer Dynamical VAEIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023 Xiaoyu Lin Xiaoyu Bie Simon Leglaive Laurent Girin Xavier Alameda-Pineda BDL 178 3 0 07 Mar 2023
Contrast-PLC: Contrastive Learning for Packet Loss ConcealmentIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023 Huaying Xue Xiulian Peng Yan Lu 173 7 0 26 Feb 2023
Separate And Diffuse: Using a Pretrained Diffusion Model for Improving Source Separation Shahar Lutati Eliya Nachmani Lior Wolf DiffM 205 19 0 25 Jan 2023
MooseNet: A Trainable Metric for Synthesized Speech with a PLDA ModuleSpeech Synthesis Workshop (SSW), 2023 Ondvrej Plátek Ondrej Dusek 176 2 0 17 Jan 2023
ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech Ze Chen Yihan Wu Yichong Leng Jiawei Chen Haohe Liu ... Ke Wang Lei He Sheng Zhao Jiang Bian Danilo Mandic DiffM 204 26 0 30 Dec 2022
Semantics-Empowered Communication: A Tutorial-cum-Survey Zhilin Lu Rongpeng Li Kun Lu Xianfu Chen Ekram Hossain Zhifeng Zhao Honggang Zhang 503 23 0 16 Dec 2022
BLASER: A Text-Free Speech-to-Speech Translation Evaluation MetricAnnual Meeting of the Association for Computational Linguistics (ACL), 2022 Mingda Chen Paul-Ambroise Duquenne Pierre Yves Andrews Justine T. Kao Alexandre Mourachko Holger Schwenk Marta R. Costa-jussá 246 23 0 16 Dec 2022
Evaluating and reducing the distance between synthetic and real speech distributionsInterspeech (Interspeech), 2022 Christoph Minixhofer Ondˇrej Klejch P. Bell 217 9 0 29 Nov 2022
Deep Fake Detection, Deterrence and Response: Challenges and Opportunities Amin Azmoodeh Ali Dehghantanha 173 3 0 26 Nov 2022
Towards Building Text-To-Speech Systems for the Next Billion UsersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022 Gokul Karthik Kumar V. PraveenS. Pratyush Kumar Mitesh M. Khapra Karthik Nandakumar 271 28 0 17 Nov 2022
NANSY++: Unified Voice Synthesis with Neural Analysis and SynthesisInternational Conference on Learning Representations (ICLR), 2022 Hyeong-Seok Choi Jinhyeok Yang Juheon Lee Hyeongju Kim 228 54 0 17 Nov 2022
Efficiently Trained Low-Resource Mongolian Text-to-Speech System Based On FullConv-TTS Ziqi Liang 178 0 0 24 Oct 2022
Adversarial Permutation Invariant Training for Universal Sound SeparationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022 Emilian Postolache Jordi Pons Santiago Pascual Joan Serrà VLM 269 10 0 21 Oct 2022
Improving robustness of spontaneous speech synthesis with linguistic speech regularization and pseudo-filled-pause insertionSpeech Synthesis Workshop (SSW), 2022 Yuta Matsunaga Takaaki Saeki Shinnosuke Takamichi Hiroshi Saruwatari 212 2 0 18 Oct 2022
Images as Weight Matrices: Sequential Image Generation Through Synaptic Learning RulesInternational Conference on Learning Representations (ICLR), 2022 Kazuki Irie Jürgen Schmidhuber 229 9 0 07 Oct 2022
Mandarin Singing Voice Synthesis with Denoising Diffusion Probabilistic Wasserstein GANAsia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2022 Yin-Ping Cho Yu Tsao Hsin-Min Wang Yi-Wen Liu DiffM 221 9 0 21 Sep 2022
Lightweight Long-Range Generative Adversarial Networks Bowen Li Thomas Lukasiewicz GAN 178 4 0 08 Sep 2022
AudioLM: a Language Modeling Approach to Audio GenerationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022 Zalan Borsos Raphaël Marinier Damien Vincent Eugene Kharitonov Olivier Pietquin ... Dominik Roblek O. Teboul David Grangier Marco Tagliasacchi Neil Zeghidour AuLLM 392 813 0 07 Sep 2022