UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

Interspeech (Interspeech), 2021

15 June 2021

Papers citing "UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation"

44 / 94 papers shown

SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and Music Synthesis

246

30 Jan 2024

Intelli-Z: Toward Intelligible Zero-Shot TTS

178

25 Jan 2024

Towards High-Quality and Efficient Speech Bandwidth Extension with Parallel Amplitude and Phase PredictionIEEE Transactions on Audio, Speech, and Language Processing (IEEE TASLP), 2024

196

12 Jan 2024

RaD-Net: A Repairing and Denoising Network for Speech Signal Improvement

Lei Xie

208

09 Jan 2024

ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations

Xin Wang

Longbiao Wang

262

22 Dec 2023

NoLACE: Improving Low-Complexity Speech Codec Enhancement Through Adaptive Temporal ShapingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

219

25 Sep 2023

HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform

Yinghao Aaron Li

Cong Han

Xilin Jiang

N. Mesgarani

247

18 Sep 2023

SnakeGAN: A Universal Vocoder Leveraging DDSP Prior Knowledge and Periodic Inductive BiasIEEE International Conference on Multimedia and Expo (ICME), 2023

Xiang Li

Zhiyong Wu

104

14 Sep 2023

QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation LearningIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2023

138

31 Aug 2023

Expressive paragraph text-to-speech synthesis with multi-step variational autoencoderInterspeech (Interspeech), 2023

Pengyuan Zhang

313

25 Aug 2023

BigWavGAN: A Wave-To-Wave Generative Adversarial Network for Music Super-ResolutionGlobal Conference on Consumer Electronics (GCE), 2023

Yenan Zhang

Hiroshi Watanabe

169

12 Aug 2023

Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTSIEEE Open Journal of Signal Processing (IEEE Open J. Signal Process.), 2023

Myeongji Ko

Yong-Hoon Choi

DiffM

161

03 Aug 2023

HierVST: Hierarchical Adaptive Zero-shot Voice Style TransferInterspeech (Interspeech), 2023

226

30 Jul 2023

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language ModelsNeural Information Processing Systems (NeurIPS), 2023

Cong Han

303

206

13 Jun 2023

HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion ModelsNeural Networks (Neural Netw.), 2023

149

12 Jun 2023

High-Fidelity Audio Compression with Improved RVQGANNeural Information Processing Systems (NeurIPS), 2023

292

561

11 Jun 2023

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesisInternational Conference on Learning Representations (ICLR), 2023

Hubert Siuzdak

393

184

01 Jun 2023

Efficient Neural Music GenerationNeural Information Processing Systems (NeurIPS), 2023

...

Jitong Chen

Yuxuan Wang

243

25 May 2023

FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder With Multiple STFTsInterspeech (Interspeech), 2023

Won Jang

D. Lim

Heayoung Park

198

18 May 2023

FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model

310

06 Mar 2023

On the Audio-visual Synchronization for Lip-to-Speech SynthesisIEEE International Conference on Computer Vision (ICCV), 2023

Zhe Niu

Brian Mak

168

01 Mar 2023

Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generatorInterspeech (Interspeech), 2023

Boris Ginsburg

221

27 Feb 2023

ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech

...

Jiang Bian

219

30 Dec 2022

Framewise WaveGAN: High Speed Adversarial Vocoder in Time Domain with Very Low Computational ComplexityIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

134

08 Dec 2022

VISinger 2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing SynthesizerInterspeech (Interspeech), 2022

245

05 Nov 2022

Robust MelGAN: A robust universal neural vocoder for high-fidelity TTSInternational Symposium on Chinese Spoken Language Processing (ISCSLP), 2022

Jian Cong

139

31 Oct 2022

Nonparallel High-Quality Audio Super Resolution with Domain Adaptation and Resampling CycleGANsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Reo Yoneyama

Ryuichi Yamamoto

Kentaro Tachibana

166

28 Oct 2022

Source-Filter HiFi-GAN: Fast and Pitch Controllable High-Fidelity Neural VocoderIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Reo Yoneyama

Yi-Chiao Wu

Tomoki Toda

257

27 Oct 2022

Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations

723

27 Oct 2022

HiFi-WaveGAN: Generative Adversarial Network with Auxiliary Spectrogram-Phase Loss for High-Fidelity Singing Voice GenerationInternational Symposium on Neural Networks (ISNN), 2022

Chunhui Wang

Chang Zeng

Jun Chen

Xingji He

257

23 Oct 2022

WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point IterationSpoken Language Technology Workshop (SLT), 2022

220

03 Oct 2022

A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTSInterspeech (Interspeech), 2022

153

22 Sep 2022

Music Separation Enhancement with Generative ModelingInternational Society for Music Information Retrieval Conference (ISMIR), 2022

213

26 Aug 2022

Avocodo: Generative Adversarial Network for Artifact-free VocoderAAAI Conference on Artificial Intelligence (AAAI), 2022

260

27 Jun 2022

WOLONet: Wave Outlooker for Efficient and High Fidelity Speech Synthesis

Yi Wang

Yi Si

20 Jun 2022

BigVGAN: A Universal Neural Vocoder with Large-Scale TrainingInternational Conference on Learning Representations (ICLR), 2022

Boris Ginsburg

307

379

09 Jun 2022

End-to-End Zero-Shot Voice Conversion with Location-Variable ConvolutionsInterspeech (Interspeech), 2022

Wonjune Kang

M. Hasegawa-Johnson

D. Roy

242

19 May 2022

FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech SynthesisInternational Joint Conference on Artificial Intelligence (IJCAI), 2022

Rongjie Huang

Zhou Zhao

149

208

21 Apr 2022

Adversarial Learning of Intermediate Acoustic Feature for End-to-End Lightweight Text-to-SpeechInterspeech (Interspeech), 2022

160

05 Apr 2022

JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to SpeechInterspeech (Interspeech), 2022

D. Lim

Sunghee Jung

Eesung Kim

358

31 Mar 2022

Phase-Aware Spoof Speech Detection Based on Res2Net with Phase NetworkIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Juntae Kim

S. Ban

218

21 Mar 2022

A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTSInterspeech (Interspeech), 2022

819

02 Mar 2022

RefineGAN: Universally Generating Waveform Better than Ground Truth with Highly Accurate Pitch and Intensity ResponsesInterspeech (Interspeech), 2021

Shengyuan Xu

Wenxiao Zhao

Jing Guo

239

01 Nov 2021

FlowVocoder: A small Footprint Neural Vocoder based Normalizing flow for Speech SynthesisInterspeech (Interspeech), 2021

Manh Luong

Viet-Anh Tran

103

27 Sep 2021