Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2011.01557
Cited By
v1
v2 (latest)
StyleMelGAN: An Efficient High-Fidelity Adversarial Vocoder with Temporal Adaptive Normalization
3 November 2020
Ahmed Mustafa
N. Pia
Guillaume Fuchs
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"StyleMelGAN: An Efficient High-Fidelity Adversarial Vocoder with Temporal Adaptive Normalization"
44 / 44 papers shown
Real-Time Streaming Mel Vocoding with Generative Flow Matching
Simon Welker
Tal Peer
Timo Gerkmann
133
1
0
18 Sep 2025
AUDETER: A Large-scale Dataset for Deepfake Audio Detection in Open Worlds
Qizhou Wang
Hanxun Huang
Guansong Pang
S. Erfani
Christopher Leckie
205
0
0
04 Sep 2025
SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation Methods
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Wen Huang
Yanmei Gu
Zhiming Wang
Huijia Zhu
Yanmin Qian
248
10
0
29 Jul 2025
DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis
IEEE Access (IEEE Access), 2025
Zeeshan Ahmad
Shudi Bao
Meng Chen
280
5
0
14 May 2025
SafeEar: Content Privacy-Preserving Audio Deepfake Detection
Conference on Computer and Communications Security (CCS), 2024
Xinfeng Li
Kai Li
Yifan Zheng
Chen Yan
Xiaoyu Ji
Wei Dong
286
49
0
14 Sep 2024
Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm
ACM Multimedia (MM), 2024
Yuning Wu
Jiatong Shi
Yifeng Yu
Yuxun Tang
Tao Qian
Yueqian Lin
Jionghao Han
Xinyi Bai
Shinji Watanabe
Qin Jin
281
7
0
11 Sep 2024
Enhancing Kurdish Text-to-Speech with Native Corpus Training: A High-Quality WaveGlow Vocoder Approach
Abdulhady Abas Abdullah
Sabat Salih Muhamad
Hadi Veisi
253
0
0
10 Sep 2024
VNet: A GAN-based Multi-Tier Discriminator Network for Speech Synthesis Vocoders
IEEE International Conference on Systems, Man and Cybernetics (SMC), 2024
Yubing Cao
Yongming Li
Liejun Wang
Yinfeng Yu
177
2
0
13 Aug 2024
ADD 2023: Towards Audio Deepfake Detection and Analysis in the Wild
Jiangyan Yi
Chu Yuan Zhang
Jianhua Tao
Chenglong Wang
Xinrui Yan
Yong Ren
Hao Gu
Junzuo Zhou
324
15
0
09 Aug 2024
Fine-Grained and Interpretable Neural Speech Editing
Max Morrison
Cameron Churchwell
Nathan Pruyne
Bryan Pardo
323
13
0
07 Jul 2024
GANetic Loss for Generative Adversarial Networks with a Focus on Medical Applications
S. Akhmedova
Nils Körber
GAN
MedIm
272
1
0
07 Jun 2024
Training Generative Adversarial Network-Based Vocoder with Limited Data Using Augmentation-Conditional Discriminator
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
205
1
0
25 Mar 2024
An Intra-BRNN and GB-RVQ Based END-TO-END Neural Audio Codec
Linping Xu
Jiawei Jiang
Dejun Zhang
Xianjun Xia
Li Chen
Yijian Xiao
Piao Ding
Shenyi Song
Sixing Yin
Ferdous Sohel
238
10
0
02 Feb 2024
Rapid Speaker Adaptation in Low Resource Text to Speech Systems using Synthetic Data and Transfer learning
Pacific Asia Conference on Language, Information and Computation (PACLIC), 2023
Raviraj Joshi
Nikesh Garera
311
2
0
02 Dec 2023
Code-Mixed Text to Speech Synthesis under Low-Resource Constraints
International Conference on Speech and Computer (SPECOM), 2023
Raviraj Joshi
Nikesh Garera
225
0
0
02 Dec 2023
Distinguishing Neural Speech Synthesis Models Through Fingerprints in Speech Waveforms
China National Conference on Chinese Computational Linguistics (CNCCL), 2023
Chu Yuan Zhang
Jiangyan Yi
Jianhua Tao
Chenglong Wang
Xinrui Yan
298
14
0
13 Sep 2023
iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNN
Interspeech (Interspeech), 2023
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Shogo Seki
204
9
0
14 Aug 2023
CQNV: A combination of coarsely quantized bitstream and neural vocoder for low rate speech coding
Interspeech (Interspeech), 2023
Youqiang Zheng
Li Xiao
Weiping Tu
Yuhong Yang
Xinmeng Xu
336
6
0
25 Jul 2023
Large-scale unsupervised audio pre-training for video-to-speech synthesis
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Triantafyllos Kefalas
Yannis Panagakis
Maja Pantic
VGen
305
5
0
27 Jun 2023
Low-Resource Text-to-Speech Using Specific Data and Noise Augmentation
European Signal Processing Conference (EUSIPCO), 2023
K. Lakshminarayana
C. Dittmar
N. Pia
Emanuel Habets
240
2
0
16 Jun 2023
Enhancing Speech-to-Speech Translation with Multiple TTS Targets
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Jiatong Shi
Yun Tang
Ann Lee
Hirofumi Inaguma
Changhan Wang
J. Pino
Shinji Watanabe
184
11
0
10 Apr 2023
Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Shogo Seki
217
12
0
24 Mar 2023
Configurable EBEN: Extreme Bandwidth Extension Network to enhance body-conducted speech capture
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Hauret Julien
Joubaud Thomas
V. Zimpfer
Bavu Éric
301
13
0
17 Mar 2023
Framewise WaveGAN: High Speed Adversarial Vocoder in Time Domain with Very Low Computational Complexity
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Ahmed Mustafa
J. Valin
Jan Büthe
Paris Smaragdis
Mike Goodwin
202
8
0
08 Dec 2022
Source-Filter HiFi-GAN: Fast and Pitch Controllable High-Fidelity Neural Vocoder
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Reo Yoneyama
Yi-Chiao Wu
Tomoki Toda
317
37
0
27 Oct 2022
Improved Normalizing Flow-Based Speech Enhancement using an All-pole Gammatone Filterbank for Conditional Input Representation
Spoken Language Technology Workshop (SLT), 2022
Martin Strauss
Matteo Torcoli
B. Edler
225
7
0
21 Oct 2022
An Initial Investigation for Detecting Vocoder Fingerprints of Fake Audio
Xin Yan
Jiangyan Yi
Jianhua Tao
Chenglong Wang
Haoxin Ma
Tao Wang
Shiming Wang
Ruibo Fu
162
46
0
20 Aug 2022
Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech
Zhengxi Liu
Qiao Tian
Chenxu Hu
Xudong Liu
Meng-Che Wu
Yuping Wang
Hang Zhao
Yuxuan Wang
212
12
0
13 Jul 2022
CFAD: A Chinese Dataset for Fake Audio Detection
Speech Communication (Speech Commun.), 2022
Haoxin Ma
Jiangyan Yi
Chenglong Wang
Xin Yan
Jianhua Tao
Tao Wang
Shiming Wang
Ruibo Fu
225
58
0
12 Jul 2022
Avocodo: Generative Adversarial Network for Artifact-free Vocoder
AAAI Conference on Artificial Intelligence (AAAI), 2022
Taejun Bak
Junmo Lee
Hanbin Bae
Jinhyeok Yang
Jaesung Bae
Young-Sun Joo
316
44
0
27 Jun 2022
WOLONet: Wave Outlooker for Efficient and High Fidelity Speech Synthesis
Yi Wang
Yi Si
128
0
0
20 Jun 2022
BigVGAN: A Universal Neural Vocoder with Large-Scale Training
International Conference on Learning Representations (ICLR), 2022
Sang-gil Lee
Ming-Yu Liu
Boris Ginsburg
Bryan Catanzaro
Sung-Hoon Yoon
368
415
0
09 Jun 2022
PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit
North American Chapter of the Association for Computational Linguistics (NAACL), 2022
Hui Zhang
Tian Yuan
Junkun Chen
Xintong Li
Renjie Zheng
...
Zeyu Chen
Xiaoguang Hu
Dianhai Yu
Yanjun Ma
Liang Huang
AuLLM
181
33
0
20 May 2022
Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis
Interspeech (Interspeech), 2022
Jiatong Shi
Shuai Guo
Tao Qian
Nan Huo
Tomoki Hayashi
...
Xuankai Chang
Hua-Wei Li
Peter Wu
Shinji Watanabe
Qin Jin
VLM
266
34
0
09 May 2022
SVTS: Scalable Video-to-Speech Synthesis
Interspeech (Interspeech), 2022
Rodrigo Mira
A. Haliassos
Stavros Petridis
Björn W. Schuller
Maja Pantic
262
42
0
04 May 2022
Practical cognitive speech compression
Reza Lotfidereshgi
P. Gournay
244
2
0
08 Mar 2022
iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Takuhiro Kaneko
Kou Tanaka
Hirokazu Kameoka
Shogo Seki
250
89
0
04 Mar 2022
PostGAN: A GAN-Based Post-Processor to Enhance the Quality of Coded Speech
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Srikanth Korse
N. Pia
Kishan Gupta
Guillaume Fuchs
306
22
0
31 Jan 2022
ESPnet2-TTS: Extending the Edge of TTS Research
Tomoki Hayashi
Ryuichi Yamamoto
Takenori Yoshimura
Peter Wu
Jiatong Shi
Takaaki Saeki
Yooncheol Ju
Yusuke Yasuda
Shinnosuke Takamichi
Shinji Watanabe
VLM
200
73
0
15 Oct 2021
SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation
Rongjie Huang
Chenye Cui
Feiyang Chen
Yi Ren
Jinglin Liu
Zhou Zhao
Baoxing Huai
N. Yuan
GAN
440
71
0
14 Oct 2021
FlowVocoder: A small Footprint Neural Vocoder based Normalizing flow for Speech Synthesis
Interspeech (Interspeech), 2021
Manh Luong
Viet-Anh Tran
137
3
0
27 Sep 2021
A Streamwise GAN Vocoder for Wideband Speech Coding at Very Low Bit Rate
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2021
Ahmed Mustafa
Jan Büthe
Srikanth Korse
Kishan Gupta
Guillaume Fuchs
N. Pia
283
23
0
09 Aug 2021
Improving the expressiveness of neural vocoding with non-affine Normalizing Flows
Adam Gabry's
Yunlong Jiao
V. Klimkov
Daniel Korzekwa
Roberto Barra-Chicote
231
1
0
16 Jun 2021
UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation
Interspeech (Interspeech), 2021
Won Jang
D. Lim
Jaesam Yoon
Bongwan Kim
Juntae Kim
362
188
0
15 Jun 2021
1
Page 1 of 1