Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.05106
Cited By
Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech
11 May 2020
Geng Yang
Shan Yang
Kai-Chun Liu
Peng Fang
Wei-Neng Chen
Lei Xie
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech"
50 / 107 papers shown
Title
DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis
Zeeshan Ahmad
Shudi Bao
Meng Chen
18
0
0
14 May 2025
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder
Bowen Zhang
Congchao Guo
Geng Yang
Hang Yu
H. M. Zhang
...
Yichen Xiao
Yiying Zhou
Y. Zhang
Yuan Lu
Yucen He
26
0
0
12 May 2025
Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications
Biel Tura Vecino
Adam Gabry's
Daniel Mątwicki
Andrzej Pomirski
Tom Iddon
Marius Cotescu
Jaime Lorenzo-Trueba
34
0
0
12 May 2025
Measuring the Robustness of Audio Deepfake Detectors
Xiang Li
Pin-Yu Chen
Wenqi Wei
38
0
0
21 Mar 2025
Less is More for Synthetic Speech Detection in the Wild
Ashi Garg
Zexin Cai
Henry Li Xinyuan
Leibny Paola García-Perera
Kevin Duh
Sanjeev Khudanpur
Matthew Wiesner
Nicholas Andrews
74
0
0
17 Feb 2025
Wavehax: Aliasing-Free Neural Waveform Synthesis Based on 2D Convolution and Harmonic Prior for Reliable Complex Spectrogram Estimation
Reo Yoneyama
Atsushi Miyashita
Ryuichi Yamamoto
T. Toda
27
1
0
11 Nov 2024
Restorative Speech Enhancement: A Progressive Approach Using SE and Codec Modules
Hsin-Tien Chiang
Hao Zhang
Yong Xu
Meng Yu
Dong Yu
28
1
0
02 Oct 2024
SafeEar: Content Privacy-Preserving Audio Deepfake Detection
Xinfeng Li
Kai Li
Yifan Zheng
Chen Yan
Xiaoyu Ji
Wenyuan Xu
31
13
0
14 Sep 2024
InstructSing: High-Fidelity Singing Voice Generation via Instructing Yourself
Chang Zeng
Chunhui Wang
Xiaoxiao Miao
Jian Zhao
Zhonglin Jiang
Yong Chen
33
0
0
10 Sep 2024
ADD 2023: Towards Audio Deepfake Detection and Analysis in the Wild
Jiangyan Yi
Chu Yuan Zhang
Jianhua Tao
Chenglong Wang
Xinrui Yan
Yong Ren
Hao Gu
Junzuo Zhou
50
1
0
09 Aug 2024
Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training
Hawraz A. Ahmad
Tarik A. Rashid
33
0
0
06 Aug 2024
Source Tracing of Audio Deepfake Systems
Nicholas Klein
Tianxiang Chen
Hemlata Tak
Ricardo Casal
Elie Khoury
32
4
0
10 Jul 2024
RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text
Jiaben Chen
Xin Yan
Yihang Chen
Siyuan Cen
Qinwei Ma
Haoyu Zhen
Kaizhi Qian
Lie Lu
Chuang Gan
38
0
0
30 May 2024
Training Generative Adversarial Network-Based Vocoder with Limited Data Using Augmentation-Conditional Discriminator
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
21
0
0
25 Mar 2024
RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction
Peng Liu
Dongyang Dai
Zhiyong Wu
24
2
0
08 Mar 2024
EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial Networks
Shijia Liao
Shiyi Lan
Arun George Zachariah
10
1
0
31 Jan 2024
Ultra-lightweight Neural Differential DSP Vocoder For High Quality Speech Synthesis
Prabhav Agrawal
Thilo Köhler
Zhiping Xiu
Prashant Serai
Qing He
18
1
0
19 Jan 2024
Enabling Acoustic Audience Feedback in Large Virtual Events
Tamay Aykut
M. Hofbauer
Christopher B. Kuhn
Eckehard Steinbach
Bernd Girod
38
0
0
27 Oct 2023
FastGraphTTS: An Ultrafast Syntax-Aware Speech Synthesis Framework
Jianzong Wang
Xulong Zhang
Aolan Sun
Ning Cheng
Jing Xiao
29
1
0
16 Sep 2023
Cross-lingual Knowledge Distillation via Flow-based Voice Conversion for Robust Polyglot Text-To-Speech
Dariusz Piotrowski
Renard Korzeniowski
Alessio Falai
Sebastian Cygert
Kamil Pokora
Georgi Tinchev
Ziyao Zhang
K. Yanagisawa
23
1
0
15 Sep 2023
Distinguishing Neural Speech Synthesis Models Through Fingerprints in Speech Waveforms
Chu Yuan Zhang
Jiangyan Yi
Jianhua Tao
Chenglong Wang
Xinrui Yan
8
2
0
13 Sep 2023
AI-Generated Content (AIGC) for Various Data Modalities: A Survey
Lin Geng Foo
Hossein Rahmani
J. Liu
73
31
0
27 Aug 2023
iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNN
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Shogo Seki
23
4
0
14 Aug 2023
Large-scale unsupervised audio pre-training for video-to-speech synthesis
Triantafyllos Kefalas
Yannis Panagakis
M. Pantic
VGen
32
3
0
27 Jun 2023
High-Fidelity Audio Compression with Improved RVQGAN
Rithesh Kumar
Prem Seetharaman
Alejandro Luebs
I. Kumar
Kundan Kumar
33
282
0
11 Jun 2023
Towards generalizing deep-audio fake detection networks
Konstantin Gasenzer
Moritz Wolter
28
4
0
22 May 2023
APNet: An All-Frame-Level Neural Vocoder Incorporating Direct Prediction of Amplitude and Phase Spectra
Yang Ai
Zhenhua Ling
20
13
0
13 May 2023
Using Deepfake Technologies for Word Emphasis Detection
Eran Kaufman
Lee-Ad Gottlieb
14
0
0
12 May 2023
AI-Synthesized Voice Detection Using Neural Vocoder Artifacts
Chengzhe Sun
Shan Jia
Shuwei Hou
Siwei Lyu
30
38
0
25 Apr 2023
Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Shogo Seki
21
9
0
24 Mar 2023
Configurable EBEN: Extreme Bandwidth Extension Network to enhance body-conducted speech capture
Hauret Julien
Joubaud Thomas
V. Zimpfer
Bavu Éric
11
6
0
17 Mar 2023
Continuous descriptor-based control for deep audio synthesis
Ninon Devis
Nils Demerlé
Sarah Nabi
David Genova
P. Esling
17
9
0
27 Feb 2023
Exposing AI-Synthesized Human Voices Using Neural Vocoder Artifacts
Chengzhe Sun
Shan Jia
Shuwei Hou
Ehab AlBadawy
Siwei Lyu
120
3
0
18 Feb 2023
Towards Building Text-To-Speech Systems for the Next Billion Users
Gokul Karthik Kumar
V. PraveenS.
Pratyush Kumar
Mitesh M. Khapra
Karthik Nandakumar
36
18
0
17 Nov 2022
Distinguishable Speaker Anonymization based on Formant and Fundamental Frequency Scaling
Jixun Yao
Qing Wang
Yi Lei
Pengcheng Guo
Linfu Xie
Namin Wang
Jie Liu
25
13
0
06 Nov 2022
DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP
Kun Song
Yongmao Zhang
Yinjiao Lei
Jian Cong
Hanzhao Li
Linfu Xie
Gang He
Jinfeng Bai
51
15
0
02 Nov 2022
Modelling black-box audio effects with time-varying feature modulation
Marco Comunità
C. Steinmetz
Huy Phan
Joshua D. Reiss
41
14
0
01 Nov 2022
Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS
Kun Song
Jian Cong
Xinsheng Wang
Yongmao Zhang
Linfu Xie
Ning Jiang
Haiying Wu
19
0
0
31 Oct 2022
Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform
Masaya Kawamura
Yuma Shirahata
Ryuichi Yamamoto
Kentaro Tachibana
24
15
0
28 Oct 2022
Spoofed training data for speech spoofing countermeasure can be efficiently created using neural vocoders
Xin Wang
Junichi Yamagishi
16
36
0
19 Oct 2022
SpecRNet: Towards Faster and More Accessible Audio DeepFake Detection
Piotr Kawa
Marcin Plata
P. Syga
24
14
0
12 Oct 2022
WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point Iteration
Yuma Koizumi
Kohei Yatabe
Heiga Zen
M. Bacchiani
DiffM
42
29
0
03 Oct 2022
NWPU-ASLP System for the VoicePrivacy 2022 Challenge
Jixun Yao
Qing Wang
Li Lyna Zhang
Pengcheng Guo
Yuhao Liang
Linfu Xie
PICV
21
16
0
24 Sep 2022
HPPNet: Modeling the Harmonic Structure and Pitch Invariance in Piano Transcription
Weixing Wei
P. Li
Yi Yu
Wei Li
11
14
0
30 Aug 2022
An Initial Investigation for Detecting Vocoder Fingerprints of Fake Audio
Xin Yan
Jiangyan Yi
J. Tao
Chenglong Wang
Haoxin Ma
Tao Wang
Shiming Wang
Ruibo Fu
12
25
0
20 Aug 2022
A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis
Qibing Bai
Tom Ko
Yu Zhang
16
4
0
03 Aug 2022
Subband-based Generative Adversarial Network for Non-parallel Many-to-many Voice Conversion
Jianchun Ma
Zhedong Zheng
Hao Fei
Feng Zheng
Tat-Seng Chua
Yi Yang
GAN
29
0
0
13 Jul 2022
CFAD: A Chinese Dataset for Fake Audio Detection
Haoxin Ma
Jiangyan Yi
Chenglong Wang
Xin Yan
J. Tao
Tao Wang
Shiming Wang
Ruibo Fu
16
26
0
12 Jul 2022
End-to-end speech recognition modeling from de-identified data
M. Flechl
Shou-Chun Yin
Junho Park
Peter Skala
13
4
0
12 Jul 2022
WeSinger 2: Fully Parallel Singing Voice Synthesis via Multi-Singer Conditional Adversarial Training
Zewang Zhang
Yibin Zheng
Xinhui Li
Li Lu
DiffM
23
11
0
05 Jul 2022
1
2
3
Next