ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.05106
  4. Cited By
Multi-band MelGAN: Faster Waveform Generation for High-Quality
  Text-to-Speech

Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech

11 May 2020
Geng Yang
Shan Yang
Kai-Chun Liu
Peng Fang
Wei-Neng Chen
Lei Xie
ArXivPDFHTML

Papers citing "Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech"

50 / 107 papers shown
Title
DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis
DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis
Zeeshan Ahmad
Shudi Bao
Meng Chen
18
0
0
14 May 2025
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder
Bowen Zhang
Congchao Guo
Geng Yang
Hang Yu
H. M. Zhang
...
Yichen Xiao
Yiying Zhou
Y. Zhang
Yuan Lu
Yucen He
26
0
0
12 May 2025
Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications
Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications
Biel Tura Vecino
Adam Gabry's
Daniel Mątwicki
Andrzej Pomirski
Tom Iddon
Marius Cotescu
Jaime Lorenzo-Trueba
34
0
0
12 May 2025
Measuring the Robustness of Audio Deepfake Detectors
Measuring the Robustness of Audio Deepfake Detectors
Xiang Li
Pin-Yu Chen
Wenqi Wei
38
0
0
21 Mar 2025
Less is More for Synthetic Speech Detection in the Wild
Less is More for Synthetic Speech Detection in the Wild
Ashi Garg
Zexin Cai
Henry Li Xinyuan
Leibny Paola García-Perera
Kevin Duh
Sanjeev Khudanpur
Matthew Wiesner
Nicholas Andrews
74
0
0
17 Feb 2025
Wavehax: Aliasing-Free Neural Waveform Synthesis Based on 2D Convolution
  and Harmonic Prior for Reliable Complex Spectrogram Estimation
Wavehax: Aliasing-Free Neural Waveform Synthesis Based on 2D Convolution and Harmonic Prior for Reliable Complex Spectrogram Estimation
Reo Yoneyama
Atsushi Miyashita
Ryuichi Yamamoto
T. Toda
27
1
0
11 Nov 2024
Restorative Speech Enhancement: A Progressive Approach Using SE and
  Codec Modules
Restorative Speech Enhancement: A Progressive Approach Using SE and Codec Modules
Hsin-Tien Chiang
Hao Zhang
Yong Xu
Meng Yu
Dong Yu
28
1
0
02 Oct 2024
SafeEar: Content Privacy-Preserving Audio Deepfake Detection
SafeEar: Content Privacy-Preserving Audio Deepfake Detection
Xinfeng Li
Kai Li
Yifan Zheng
Chen Yan
Xiaoyu Ji
Wenyuan Xu
31
13
0
14 Sep 2024
InstructSing: High-Fidelity Singing Voice Generation via Instructing
  Yourself
InstructSing: High-Fidelity Singing Voice Generation via Instructing Yourself
Chang Zeng
Chunhui Wang
Xiaoxiao Miao
Jian Zhao
Zhonglin Jiang
Yong Chen
33
0
0
10 Sep 2024
ADD 2023: Towards Audio Deepfake Detection and Analysis in the Wild
ADD 2023: Towards Audio Deepfake Detection and Analysis in the Wild
Jiangyan Yi
Chu Yuan Zhang
Jianhua Tao
Chenglong Wang
Xinrui Yan
Yong Ren
Hao Gu
Junzuo Zhou
50
1
0
09 Aug 2024
Central Kurdish Text-to-Speech Synthesis with Novel End-to-End
  Transformer Training
Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training
Hawraz A. Ahmad
Tarik A. Rashid
33
0
0
06 Aug 2024
Source Tracing of Audio Deepfake Systems
Source Tracing of Audio Deepfake Systems
Nicholas Klein
Tianxiang Chen
Hemlata Tak
Ricardo Casal
Elie Khoury
32
4
0
10 Jul 2024
RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text
RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text
Jiaben Chen
Xin Yan
Yihang Chen
Siyuan Cen
Qinwei Ma
Haoyu Zhen
Kaizhi Qian
Lie Lu
Chuang Gan
38
0
0
30 May 2024
Training Generative Adversarial Network-Based Vocoder with Limited Data
  Using Augmentation-Conditional Discriminator
Training Generative Adversarial Network-Based Vocoder with Limited Data Using Augmentation-Conditional Discriminator
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
21
0
0
25 Mar 2024
RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction
RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction
Peng Liu
Dongyang Dai
Zhiyong Wu
24
2
0
08 Mar 2024
EVA-GAN: Enhanced Various Audio Generation via Scalable Generative
  Adversarial Networks
EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial Networks
Shijia Liao
Shiyi Lan
Arun George Zachariah
10
1
0
31 Jan 2024
Ultra-lightweight Neural Differential DSP Vocoder For High Quality
  Speech Synthesis
Ultra-lightweight Neural Differential DSP Vocoder For High Quality Speech Synthesis
Prabhav Agrawal
Thilo Köhler
Zhiping Xiu
Prashant Serai
Qing He
18
1
0
19 Jan 2024
Enabling Acoustic Audience Feedback in Large Virtual Events
Enabling Acoustic Audience Feedback in Large Virtual Events
Tamay Aykut
M. Hofbauer
Christopher B. Kuhn
Eckehard Steinbach
Bernd Girod
38
0
0
27 Oct 2023
FastGraphTTS: An Ultrafast Syntax-Aware Speech Synthesis Framework
FastGraphTTS: An Ultrafast Syntax-Aware Speech Synthesis Framework
Jianzong Wang
Xulong Zhang
Aolan Sun
Ning Cheng
Jing Xiao
29
1
0
16 Sep 2023
Cross-lingual Knowledge Distillation via Flow-based Voice Conversion for
  Robust Polyglot Text-To-Speech
Cross-lingual Knowledge Distillation via Flow-based Voice Conversion for Robust Polyglot Text-To-Speech
Dariusz Piotrowski
Renard Korzeniowski
Alessio Falai
Sebastian Cygert
Kamil Pokora
Georgi Tinchev
Ziyao Zhang
K. Yanagisawa
23
1
0
15 Sep 2023
Distinguishing Neural Speech Synthesis Models Through Fingerprints in
  Speech Waveforms
Distinguishing Neural Speech Synthesis Models Through Fingerprints in Speech Waveforms
Chu Yuan Zhang
Jiangyan Yi
Jianhua Tao
Chenglong Wang
Xinrui Yan
8
2
0
13 Sep 2023
AI-Generated Content (AIGC) for Various Data Modalities: A Survey
AI-Generated Content (AIGC) for Various Data Modalities: A Survey
Lin Geng Foo
Hossein Rahmani
J. Liu
73
31
0
27 Aug 2023
iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using
  1D-2D CNN
iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNN
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Shogo Seki
23
4
0
14 Aug 2023
Large-scale unsupervised audio pre-training for video-to-speech
  synthesis
Large-scale unsupervised audio pre-training for video-to-speech synthesis
Triantafyllos Kefalas
Yannis Panagakis
M. Pantic
VGen
32
3
0
27 Jun 2023
High-Fidelity Audio Compression with Improved RVQGAN
High-Fidelity Audio Compression with Improved RVQGAN
Rithesh Kumar
Prem Seetharaman
Alejandro Luebs
I. Kumar
Kundan Kumar
33
282
0
11 Jun 2023
Towards generalizing deep-audio fake detection networks
Towards generalizing deep-audio fake detection networks
Konstantin Gasenzer
Moritz Wolter
28
4
0
22 May 2023
APNet: An All-Frame-Level Neural Vocoder Incorporating Direct Prediction
  of Amplitude and Phase Spectra
APNet: An All-Frame-Level Neural Vocoder Incorporating Direct Prediction of Amplitude and Phase Spectra
Yang Ai
Zhenhua Ling
20
13
0
13 May 2023
Using Deepfake Technologies for Word Emphasis Detection
Using Deepfake Technologies for Word Emphasis Detection
Eran Kaufman
Lee-Ad Gottlieb
14
0
0
12 May 2023
AI-Synthesized Voice Detection Using Neural Vocoder Artifacts
AI-Synthesized Voice Detection Using Neural Vocoder Artifacts
Chengzhe Sun
Shan Jia
Shuwei Hou
Siwei Lyu
30
38
0
25 Apr 2023
Wave-U-Net Discriminator: Fast and Lightweight Discriminator for
  Generative Adversarial Network-Based Speech Synthesis
Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Shogo Seki
21
9
0
24 Mar 2023
Configurable EBEN: Extreme Bandwidth Extension Network to enhance
  body-conducted speech capture
Configurable EBEN: Extreme Bandwidth Extension Network to enhance body-conducted speech capture
Hauret Julien
Joubaud Thomas
V. Zimpfer
Bavu Éric
11
6
0
17 Mar 2023
Continuous descriptor-based control for deep audio synthesis
Continuous descriptor-based control for deep audio synthesis
Ninon Devis
Nils Demerlé
Sarah Nabi
David Genova
P. Esling
17
9
0
27 Feb 2023
Exposing AI-Synthesized Human Voices Using Neural Vocoder Artifacts
Exposing AI-Synthesized Human Voices Using Neural Vocoder Artifacts
Chengzhe Sun
Shan Jia
Shuwei Hou
Ehab AlBadawy
Siwei Lyu
120
3
0
18 Feb 2023
Towards Building Text-To-Speech Systems for the Next Billion Users
Towards Building Text-To-Speech Systems for the Next Billion Users
Gokul Karthik Kumar
V. PraveenS.
Pratyush Kumar
Mitesh M. Khapra
Karthik Nandakumar
36
18
0
17 Nov 2022
Distinguishable Speaker Anonymization based on Formant and Fundamental
  Frequency Scaling
Distinguishable Speaker Anonymization based on Formant and Fundamental Frequency Scaling
Jixun Yao
Qing Wang
Yi Lei
Pengcheng Guo
Linfu Xie
Namin Wang
Jie Liu
25
13
0
06 Nov 2022
DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by
  time-frequency domain supervision from DSP
DSPGAN: a GAN-based universal vocoder for high-fidelity TTS by time-frequency domain supervision from DSP
Kun Song
Yongmao Zhang
Yinjiao Lei
Jian Cong
Hanzhao Li
Linfu Xie
Gang He
Jinfeng Bai
51
15
0
02 Nov 2022
Modelling black-box audio effects with time-varying feature modulation
Modelling black-box audio effects with time-varying feature modulation
Marco Comunità
C. Steinmetz
Huy Phan
Joshua D. Reiss
41
14
0
01 Nov 2022
Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS
Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS
Kun Song
Jian Cong
Xinsheng Wang
Yongmao Zhang
Linfu Xie
Ning Jiang
Haiying Wu
19
0
0
31 Oct 2022
Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band
  Generation and Inverse Short-Time Fourier Transform
Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform
Masaya Kawamura
Yuma Shirahata
Ryuichi Yamamoto
Kentaro Tachibana
24
15
0
28 Oct 2022
Spoofed training data for speech spoofing countermeasure can be
  efficiently created using neural vocoders
Spoofed training data for speech spoofing countermeasure can be efficiently created using neural vocoders
Xin Wang
Junichi Yamagishi
16
36
0
19 Oct 2022
SpecRNet: Towards Faster and More Accessible Audio DeepFake Detection
SpecRNet: Towards Faster and More Accessible Audio DeepFake Detection
Piotr Kawa
Marcin Plata
P. Syga
24
14
0
12 Oct 2022
WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on
  Fixed-Point Iteration
WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point Iteration
Yuma Koizumi
Kohei Yatabe
Heiga Zen
M. Bacchiani
DiffM
42
29
0
03 Oct 2022
NWPU-ASLP System for the VoicePrivacy 2022 Challenge
NWPU-ASLP System for the VoicePrivacy 2022 Challenge
Jixun Yao
Qing Wang
Li Lyna Zhang
Pengcheng Guo
Yuhao Liang
Linfu Xie
PICV
21
16
0
24 Sep 2022
HPPNet: Modeling the Harmonic Structure and Pitch Invariance in Piano
  Transcription
HPPNet: Modeling the Harmonic Structure and Pitch Invariance in Piano Transcription
Weixing Wei
P. Li
Yi Yu
Wei Li
11
14
0
30 Aug 2022
An Initial Investigation for Detecting Vocoder Fingerprints of Fake
  Audio
An Initial Investigation for Detecting Vocoder Fingerprints of Fake Audio
Xin Yan
Jiangyan Yi
J. Tao
Chenglong Wang
Haoxin Ma
Tao Wang
Shiming Wang
Ruibo Fu
12
25
0
20 Aug 2022
A Study of Modeling Rising Intonation in Cantonese Neural Speech
  Synthesis
A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis
Qibing Bai
Tom Ko
Yu Zhang
16
4
0
03 Aug 2022
Subband-based Generative Adversarial Network for Non-parallel
  Many-to-many Voice Conversion
Subband-based Generative Adversarial Network for Non-parallel Many-to-many Voice Conversion
Jianchun Ma
Zhedong Zheng
Hao Fei
Feng Zheng
Tat-Seng Chua
Yi Yang
GAN
29
0
0
13 Jul 2022
CFAD: A Chinese Dataset for Fake Audio Detection
CFAD: A Chinese Dataset for Fake Audio Detection
Haoxin Ma
Jiangyan Yi
Chenglong Wang
Xin Yan
J. Tao
Tao Wang
Shiming Wang
Ruibo Fu
16
26
0
12 Jul 2022
End-to-end speech recognition modeling from de-identified data
End-to-end speech recognition modeling from de-identified data
M. Flechl
Shou-Chun Yin
Junho Park
Peter Skala
13
4
0
12 Jul 2022
WeSinger 2: Fully Parallel Singing Voice Synthesis via Multi-Singer
  Conditional Adversarial Training
WeSinger 2: Fully Parallel Singing Voice Synthesis via Multi-Singer Conditional Adversarial Training
Zewang Zhang
Yibin Zheng
Xinhui Li
Li Lu
DiffM
23
11
0
05 Jul 2022
123
Next