ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.07889
  4. Cited By
UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram
  Discriminators for High-Fidelity Waveform Generation

UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

Interspeech (Interspeech), 2021
15 June 2021
Won Jang
D. Lim
Jaesam Yoon
Bongwan Kim
Juntae Kim
ArXiv (abs)PDFHTML

Papers citing "UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation"

44 / 94 papers shown
SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and
  Music Synthesis
SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and Music Synthesis
Teysir Baoueb
Haocheng Liu
Mathieu Fontaine
Jonathan Le Roux
Gaël Richard
DiffM
246
8
0
30 Jan 2024
Intelli-Z: Toward Intelligible Zero-Shot TTS
Intelli-Z: Toward Intelligible Zero-Shot TTS
Sunghee Jung
Won Jang
Jaesam Yoon
Bongwan Kim
178
1
0
25 Jan 2024
Towards High-Quality and Efficient Speech Bandwidth Extension with
  Parallel Amplitude and Phase Prediction
Towards High-Quality and Efficient Speech Bandwidth Extension with Parallel Amplitude and Phase PredictionIEEE Transactions on Audio, Speech, and Language Processing (IEEE TASLP), 2024
Ye-Xin Lu
Yang Ai
Hui-Peng Du
Zhenhua Ling
196
26
0
12 Jan 2024
RaD-Net: A Repairing and Denoising Network for Speech Signal Improvement
RaD-Net: A Repairing and Denoising Network for Speech Signal Improvement
Mingshuai Liu
Zhuangqi Chen
Xiaopeng Yan
Yuanjun Lv
Xianjun Xia
Chuanzeng Huang
Yijian Xiao
Lei Xie
208
6
0
09 Jan 2024
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis
  Conditioned on Self-supervised Discrete Speech Representations
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
Cheng Gong
Xin Wang
Erica Cooper
Dan Wells
Longbiao Wang
Jianwu Dang
Korin Richmond
Junichi Yamagishi
262
36
0
22 Dec 2023
NoLACE: Improving Low-Complexity Speech Codec Enhancement Through
  Adaptive Temporal Shaping
NoLACE: Improving Low-Complexity Speech Codec Enhancement Through Adaptive Temporal ShapingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Jan Büthe
Ahmed Mustafa
J. Valin
Karim Helwani
Mike Goodwin
219
7
0
25 Sep 2023
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise
  Filter and Inverse Short Time Fourier Transform
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform
Yinghao Aaron Li
Cong Han
Xilin Jiang
N. Mesgarani
247
6
0
18 Sep 2023
SnakeGAN: A Universal Vocoder Leveraging DDSP Prior Knowledge and
  Periodic Inductive Bias
SnakeGAN: A Universal Vocoder Leveraging DDSP Prior Knowledge and Periodic Inductive BiasIEEE International Conference on Multimedia and Expo (ICME), 2023
Sipan Li
Songxiang Liu
Lu Zhang
Xiang Li
Yanyao Bian
Chao Weng
Zhiyong Wu
Helen Meng
104
3
0
14 Sep 2023
QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via
  Vector-Quantized Self-Supervised Speech Representation Learning
QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation LearningIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2023
Haohan Guo
Fenglong Xie
Jiawen Kang
Yujia Xiao
Xixin Wu
Helen M. Meng
138
4
0
31 Aug 2023
Expressive paragraph text-to-speech synthesis with multi-step
  variational autoencoder
Expressive paragraph text-to-speech synthesis with multi-step variational autoencoderInterspeech (Interspeech), 2023
Xuyuan Li
Zengqiang Shang
Peiyang Shi
Hua Hua
Jian Liu
Pengyuan Zhang
313
0
0
25 Aug 2023
BigWavGAN: A Wave-To-Wave Generative Adversarial Network for Music
  Super-Resolution
BigWavGAN: A Wave-To-Wave Generative Adversarial Network for Music Super-ResolutionGlobal Conference on Consumer Electronics (GCE), 2023
Yenan Zhang
Hiroshi Watanabe
169
0
0
12 Aug 2023
Adversarial Training of Denoising Diffusion Model Using Dual
  Discriminators for High-Fidelity Multi-Speaker TTS
Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTSIEEE Open Journal of Signal Processing (IEEE Open J. Signal Process.), 2023
Myeongji Ko
Yong-Hoon Choi
DiffM
161
2
0
03 Aug 2023
HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer
HierVST: Hierarchical Adaptive Zero-shot Voice Style TransferInterspeech (Interspeech), 2023
Sang-Hoon Lee
Haram Choi
H. Oh
Seong-Whan Lee
BDL
226
14
0
30 Jul 2023
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion
  and Adversarial Training with Large Speech Language Models
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language ModelsNeural Information Processing Systems (NeurIPS), 2023
Yinghao Aaron Li
Cong Han
Vinay S. Raghavan
Gavin Mischler
N. Mesgarani
VLMDiffM
303
206
0
13 Jun 2023
HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio
  Codec and Latent Diffusion Models
HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion ModelsNeural Networks (Neural Netw.), 2023
Ji-Sang Hwang
Sang-Hoon Lee
Seong-Whan Lee
DiffM
149
20
0
12 Jun 2023
High-Fidelity Audio Compression with Improved RVQGAN
High-Fidelity Audio Compression with Improved RVQGANNeural Information Processing Systems (NeurIPS), 2023
Rithesh Kumar
Prem Seetharaman
Alejandro Luebs
I. Kumar
Kundan Kumar
292
561
0
11 Jun 2023
Vocos: Closing the gap between time-domain and Fourier-based neural
  vocoders for high-quality audio synthesis
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesisInternational Conference on Learning Representations (ICLR), 2023
Hubert Siuzdak
393
184
0
01 Jun 2023
Efficient Neural Music Generation
Efficient Neural Music GenerationNeural Information Processing Systems (NeurIPS), 2023
Max W. Y. Lam
Qiao Tian
Tang-Chun Li
Zongyu Yin
Siyuan Feng
...
Mingbo Ma
Xuchen Song
Jitong Chen
Yuping Wang
Yuxuan Wang
DiffMMGen
243
82
0
25 May 2023
FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net
  Encoder With Multiple STFTs
FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder With Multiple STFTsInterspeech (Interspeech), 2023
Won Jang
D. Lim
Heayoung Park
198
1
0
18 May 2023
FoundationTTS: Text-to-Speech for ASR Customization with Generative
  Language Model
FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model
Rui Xue
Yanqing Liu
Lei He
Xuejiao Tan
Linquan Liu
Ed Lin
Sheng Zhao
310
8
0
06 Mar 2023
On the Audio-visual Synchronization for Lip-to-Speech Synthesis
On the Audio-visual Synchronization for Lip-to-Speech SynthesisIEEE International Conference on Computer Vision (ICCV), 2023
Zhe Niu
Brian Mak
168
4
0
01 Mar 2023
Text-only domain adaptation for end-to-end ASR using integrated
  text-to-mel-spectrogram generator
Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generatorInterspeech (Interspeech), 2023
Vladimir Bataev
Roman Korostik
Evgeny Shabalin
Vitaly Lavrukhin
Boris Ginsburg
VLM
221
18
0
27 Feb 2023
ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to
  Speech
ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech
Ze Chen
Yihan Wu
Yichong Leng
Jiawei Chen
Haohe Liu
...
Ke Wang
Lei He
Sheng Zhao
Jiang Bian
Danilo Mandic
DiffM
219
26
0
30 Dec 2022
Framewise WaveGAN: High Speed Adversarial Vocoder in Time Domain with
  Very Low Computational Complexity
Framewise WaveGAN: High Speed Adversarial Vocoder in Time Domain with Very Low Computational ComplexityIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Ahmed Mustafa
J. Valin
Jan Büthe
Paris Smaragdis
Mike Goodwin
134
6
0
08 Dec 2022
VISinger 2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by
  Digital Signal Processing Synthesizer
VISinger 2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing SynthesizerInterspeech (Interspeech), 2022
Yongmao Zhang
Heyang Xue
Hanzhao Li
Linfu Xie
Tingwei Guo
Ruixiong Zhang
Caixia Gong
DiffMVLM
245
41
0
05 Nov 2022
Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS
Robust MelGAN: A robust universal neural vocoder for high-fidelity TTSInternational Symposium on Chinese Spoken Language Processing (ISCSLP), 2022
Kun Song
Jian Cong
Xinsheng Wang
Yongmao Zhang
Linfu Xie
Ning Jiang
Haiying Wu
139
0
0
31 Oct 2022
Nonparallel High-Quality Audio Super Resolution with Domain Adaptation
  and Resampling CycleGANs
Nonparallel High-Quality Audio Super Resolution with Domain Adaptation and Resampling CycleGANsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Reo Yoneyama
Ryuichi Yamamoto
Kentaro Tachibana
166
9
0
28 Oct 2022
Source-Filter HiFi-GAN: Fast and Pitch Controllable High-Fidelity Neural
  Vocoder
Source-Filter HiFi-GAN: Fast and Pitch Controllable High-Fidelity Neural VocoderIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Reo Yoneyama
Yi-Chiao Wu
Tomoki Toda
257
35
0
27 Oct 2022
Towards High-Quality Neural TTS for Low-Resource Languages by Learning
  Compact Speech Representations
Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations
Haohan Guo
Fenglong Xie
Xixin Wu
Hui Lu
Helen Meng
723
3
0
27 Oct 2022
HiFi-WaveGAN: Generative Adversarial Network with Auxiliary
  Spectrogram-Phase Loss for High-Fidelity Singing Voice Generation
HiFi-WaveGAN: Generative Adversarial Network with Auxiliary Spectrogram-Phase Loss for High-Fidelity Singing Voice GenerationInternational Symposium on Neural Networks (ISNN), 2022
Chunhui Wang
Chang Zeng
Jun Chen
Xingji He
257
7
0
23 Oct 2022
WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on
  Fixed-Point Iteration
WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point IterationSpoken Language Technology Workshop (SLT), 2022
Yuma Koizumi
Kohei Yatabe
Heiga Zen
M. Bacchiani
DiffM
220
33
0
03 Oct 2022
A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural
  TTS
A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTSInterspeech (Interspeech), 2022
Haohan Guo
Fenglong Xie
Frank Soong
Xixin Wu
Helen M. Meng
153
15
0
22 Sep 2022
Music Separation Enhancement with Generative Modeling
Music Separation Enhancement with Generative ModelingInternational Society for Music Information Retrieval Conference (ISMIR), 2022
N. Schaffer
Boaz Cogan
Ethan Manilow
Max Morrison
Prem Seetharaman
Bryan Pardo
213
11
0
26 Aug 2022
Avocodo: Generative Adversarial Network for Artifact-free Vocoder
Avocodo: Generative Adversarial Network for Artifact-free VocoderAAAI Conference on Artificial Intelligence (AAAI), 2022
Taejun Bak
Junmo Lee
Hanbin Bae
Jinhyeok Yang
Jaesung Bae
Young-Sun Joo
260
41
0
27 Jun 2022
WOLONet: Wave Outlooker for Efficient and High Fidelity Speech Synthesis
WOLONet: Wave Outlooker for Efficient and High Fidelity Speech Synthesis
Yi Wang
Yi Si
89
0
0
20 Jun 2022
BigVGAN: A Universal Neural Vocoder with Large-Scale Training
BigVGAN: A Universal Neural Vocoder with Large-Scale TrainingInternational Conference on Learning Representations (ICLR), 2022
Sang-gil Lee
Ming-Yu Liu
Boris Ginsburg
Bryan Catanzaro
Sung-Hoon Yoon
307
379
0
09 Jun 2022
End-to-End Zero-Shot Voice Conversion with Location-Variable
  Convolutions
End-to-End Zero-Shot Voice Conversion with Location-Variable ConvolutionsInterspeech (Interspeech), 2022
Wonjune Kang
M. Hasegawa-Johnson
D. Roy
242
10
0
19 May 2022
FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech
  Synthesis
FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech SynthesisInternational Joint Conference on Artificial Intelligence (IJCAI), 2022
Rongjie Huang
Max W. Y. Lam
Jun Wang
Jane Polak Scowcroft
Dong Yu
Yi Ren
Zhou Zhao
DiffM
149
208
0
21 Apr 2022
Adversarial Learning of Intermediate Acoustic Feature for End-to-End
  Lightweight Text-to-Speech
Adversarial Learning of Intermediate Acoustic Feature for End-to-End Lightweight Text-to-SpeechInterspeech (Interspeech), 2022
Hyungchan Yoon
Seyun Um
Changwhan Kim
Hong-Goo Kang
160
0
0
05 Apr 2022
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to
  Speech
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to SpeechInterspeech (Interspeech), 2022
D. Lim
Sunghee Jung
Eesung Kim
358
63
0
31 Mar 2022
Phase-Aware Spoof Speech Detection Based on Res2Net with Phase Network
Phase-Aware Spoof Speech Detection Based on Res2Net with Phase NetworkIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Juntae Kim
S. Ban
218
23
0
21 Mar 2022
A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based
  Non-Autoregressive TTS
A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTSInterspeech (Interspeech), 2022
Haohan Guo
Hui Lu
Xixin Wu
Helen Meng
819
9
0
02 Mar 2022
RefineGAN: Universally Generating Waveform Better than Ground Truth with
  Highly Accurate Pitch and Intensity Responses
RefineGAN: Universally Generating Waveform Better than Ground Truth with Highly Accurate Pitch and Intensity ResponsesInterspeech (Interspeech), 2021
Shengyuan Xu
Wenxiao Zhao
Jing Guo
239
14
0
01 Nov 2021
FlowVocoder: A small Footprint Neural Vocoder based Normalizing flow for
  Speech Synthesis
FlowVocoder: A small Footprint Neural Vocoder based Normalizing flow for Speech SynthesisInterspeech (Interspeech), 2021
Manh Luong
Viet-Anh Tran
103
3
0
27 Sep 2021
Previous
12