v1v2 (latest)

StyleMelGAN: An Efficient High-Fidelity Adversarial Vocoder with Temporal Adaptive Normalization

3 November 2020

Papers citing "StyleMelGAN: An Efficient High-Fidelity Adversarial Vocoder with Temporal Adaptive Normalization"

44 / 44 papers shown

Real-Time Streaming Mel Vocoding with Generative Flow Matching

Simon Welker

Tal Peer

Timo Gerkmann

133

18 Sep 2025

AUDETER: A Large-scale Dataset for Deepfake Audio Detection in Open Worlds

205

04 Sep 2025

SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation MethodsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

248

29 Jul 2025

DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio SynthesisIEEE Access (IEEE Access), 2025

Zeeshan Ahmad

Shudi Bao

Meng Chen

280

14 May 2025

SafeEar: Content Privacy-Preserving Audio Deepfake DetectionConference on Computer and Communications Security (CCS), 2024

Xinfeng Li

Xiaoyu Ji

286

14 Sep 2024

Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New ParadigmACM Multimedia (MM), 2024

Yuning Wu

Jiatong Shi

Shinji Watanabe

281

11 Sep 2024

Enhancing Kurdish Text-to-Speech with Native Corpus Training: A High-Quality WaveGlow Vocoder Approach

Abdulhady Abas Abdullah

Sabat Salih Muhamad

Hadi Veisi

253

10 Sep 2024

VNet: A GAN-based Multi-Tier Discriminator Network for Speech Synthesis VocodersIEEE International Conference on Systems, Man and Cybernetics (SMC), 2024

Yubing Cao

Yongming Li

Liejun Wang

Yinfeng Yu

177

13 Aug 2024

ADD 2023: Towards Audio Deepfake Detection and Analysis in the Wild

Junzuo Zhou

324

09 Aug 2024

Fine-Grained and Interpretable Neural Speech Editing

Max Morrison

Cameron Churchwell

Nathan Pruyne

Bryan Pardo

323

07 Jul 2024

GANetic Loss for Generative Adversarial Networks with a Focus on Medical Applications

S. Akhmedova

Nils Körber

GAN MedIm

272

07 Jun 2024

Training Generative Adversarial Network-Based Vocoder with Limited Data Using Augmentation-Conditional Discriminator

Takuhiro Kaneko

Hirokazu Kameoka

Kou Tanaka

205

25 Mar 2024

An Intra-BRNN and GB-RVQ Based END-TO-END Neural Audio Codec

Jiawei Jiang

238

02 Feb 2024

Rapid Speaker Adaptation in Low Resource Text to Speech Systems using Synthetic Data and Transfer learningPacific Asia Conference on Language, Information and Computation (PACLIC), 2023

Raviraj Joshi

Nikesh Garera

311

02 Dec 2023

Code-Mixed Text to Speech Synthesis under Low-Resource ConstraintsInternational Conference on Speech and Computer (SPECOM), 2023

Raviraj Joshi

Nikesh Garera

225

02 Dec 2023

Distinguishing Neural Speech Synthesis Models Through Fingerprints in Speech WaveformsChina National Conference on Chinese Computational Linguistics (CNCCL), 2023

Jiangyan Yi

298

13 Sep 2023

iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNNInterspeech (Interspeech), 2023

204

14 Aug 2023

CQNV: A combination of coarsely quantized bitstream and neural vocoder for low rate speech codingInterspeech (Interspeech), 2023

336

25 Jul 2023

Large-scale unsupervised audio pre-training for video-to-speech synthesisIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

Triantafyllos Kefalas

Yannis Panagakis

Maja Pantic

VGen

305

27 Jun 2023

Low-Resource Text-to-Speech Using Specific Data and Noise AugmentationEuropean Signal Processing Conference (EUSIPCO), 2023

240

16 Jun 2023

Enhancing Speech-to-Speech Translation with Multiple TTS TargetsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Jiatong Shi

184

10 Apr 2023

Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech SynthesisIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

217

24 Mar 2023

Configurable EBEN: Extreme Bandwidth Extension Network to enhance body-conducted speech captureIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

301

17 Mar 2023

Framewise WaveGAN: High Speed Adversarial Vocoder in Time Domain with Very Low Computational ComplexityIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

202

08 Dec 2022

Source-Filter HiFi-GAN: Fast and Pitch Controllable High-Fidelity Neural VocoderIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Reo Yoneyama

Yi-Chiao Wu

Tomoki Toda

317

27 Oct 2022

Improved Normalizing Flow-Based Speech Enhancement using an All-pole Gammatone Filterbank for Conditional Input RepresentationSpoken Language Technology Workshop (SLT), 2022

Martin Strauss

Matteo Torcoli

B. Edler

225

21 Oct 2022

An Initial Investigation for Detecting Vocoder Fingerprints of Fake Audio

Jiangyan Yi

Tao Wang

162

20 Aug 2022

Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech

Hang Zhao

Yuxuan Wang

212

13 Jul 2022

CFAD: A Chinese Dataset for Fake Audio DetectionSpeech Communication (Speech Commun.), 2022

Jiangyan Yi

Tao Wang

225

12 Jul 2022

Avocodo: Generative Adversarial Network for Artifact-free VocoderAAAI Conference on Artificial Intelligence (AAAI), 2022

316

27 Jun 2022

WOLONet: Wave Outlooker for Efficient and High Fidelity Speech Synthesis

Yi Wang

Yi Si

128

20 Jun 2022

BigVGAN: A Universal Neural Vocoder with Large-Scale TrainingInternational Conference on Learning Representations (ICLR), 2022

Boris Ginsburg

368

415

09 Jun 2022

PaddleSpeech: An Easy-to-Use All-in-One Speech ToolkitNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022

...

Dianhai Yu

181

20 May 2022

Muskits: an End-to-End Music Processing Toolkit for Singing Voice SynthesisInterspeech (Interspeech), 2022

Jiatong Shi

Tao Qian

...

Peter Wu

Qin Jin

266

09 May 2022

SVTS: Scalable Video-to-Speech SynthesisInterspeech (Interspeech), 2022

Björn W. Schuller

262

04 May 2022

Practical cognitive speech compression

Reza Lotfidereshgi

P. Gournay

244

08 Mar 2022

iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier TransformIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

250

04 Mar 2022

PostGAN: A GAN-Based Post-Processor to Enhance the Quality of Coded SpeechIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

306

31 Jan 2022

ESPnet2-TTS: Extending the Edge of TTS Research

Tomoki Hayashi

Ryuichi Yamamoto

Takenori Yoshimura

Peter Wu

Jiatong Shi

200

15 Oct 2021

SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation

Rongjie Huang

Zhou Zhao

440

14 Oct 2021

FlowVocoder: A small Footprint Neural Vocoder based Normalizing flow for Speech SynthesisInterspeech (Interspeech), 2021

Manh Luong

Viet-Anh Tran

137

27 Sep 2021

A Streamwise GAN Vocoder for Wideband Speech Coding at Very Low Bit RateIEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2021

283

09 Aug 2021

Improving the expressiveness of neural vocoding with non-affine Normalizing Flows

Roberto Barra-Chicote

231

16 Jun 2021

UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform GenerationInterspeech (Interspeech), 2021

362

188

15 Jun 2021