Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2106.07889
Cited By
UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation
Interspeech (Interspeech), 2021
15 June 2021
Won Jang
D. Lim
Jaesam Yoon
Bongwan Kim
Juntae Kim
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation"
44 / 94 papers shown
SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and Music Synthesis
Teysir Baoueb
Haocheng Liu
Mathieu Fontaine
Jonathan Le Roux
Gaël Richard
DiffM
246
8
0
30 Jan 2024
Intelli-Z: Toward Intelligible Zero-Shot TTS
Sunghee Jung
Won Jang
Jaesam Yoon
Bongwan Kim
178
1
0
25 Jan 2024
Towards High-Quality and Efficient Speech Bandwidth Extension with Parallel Amplitude and Phase Prediction
IEEE Transactions on Audio, Speech, and Language Processing (IEEE TASLP), 2024
Ye-Xin Lu
Yang Ai
Hui-Peng Du
Zhenhua Ling
196
26
0
12 Jan 2024
RaD-Net: A Repairing and Denoising Network for Speech Signal Improvement
Mingshuai Liu
Zhuangqi Chen
Xiaopeng Yan
Yuanjun Lv
Xianjun Xia
Chuanzeng Huang
Yijian Xiao
Lei Xie
208
6
0
09 Jan 2024
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
Cheng Gong
Xin Wang
Erica Cooper
Dan Wells
Longbiao Wang
Jianwu Dang
Korin Richmond
Junichi Yamagishi
262
36
0
22 Dec 2023
NoLACE: Improving Low-Complexity Speech Codec Enhancement Through Adaptive Temporal Shaping
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Jan Büthe
Ahmed Mustafa
J. Valin
Karim Helwani
Mike Goodwin
219
7
0
25 Sep 2023
HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform
Yinghao Aaron Li
Cong Han
Xilin Jiang
N. Mesgarani
247
6
0
18 Sep 2023
SnakeGAN: A Universal Vocoder Leveraging DDSP Prior Knowledge and Periodic Inductive Bias
IEEE International Conference on Multimedia and Expo (ICME), 2023
Sipan Li
Songxiang Liu
Lu Zhang
Xiang Li
Yanyao Bian
Chao Weng
Zhiyong Wu
Helen Meng
104
3
0
14 Sep 2023
QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning
IEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2023
Haohan Guo
Fenglong Xie
Jiawen Kang
Yujia Xiao
Xixin Wu
Helen M. Meng
138
4
0
31 Aug 2023
Expressive paragraph text-to-speech synthesis with multi-step variational autoencoder
Interspeech (Interspeech), 2023
Xuyuan Li
Zengqiang Shang
Peiyang Shi
Hua Hua
Jian Liu
Pengyuan Zhang
313
0
0
25 Aug 2023
BigWavGAN: A Wave-To-Wave Generative Adversarial Network for Music Super-Resolution
Global Conference on Consumer Electronics (GCE), 2023
Yenan Zhang
Hiroshi Watanabe
169
0
0
12 Aug 2023
Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS
IEEE Open Journal of Signal Processing (IEEE Open J. Signal Process.), 2023
Myeongji Ko
Yong-Hoon Choi
DiffM
161
2
0
03 Aug 2023
HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer
Interspeech (Interspeech), 2023
Sang-Hoon Lee
Haram Choi
H. Oh
Seong-Whan Lee
BDL
226
14
0
30 Jul 2023
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Neural Information Processing Systems (NeurIPS), 2023
Yinghao Aaron Li
Cong Han
Vinay S. Raghavan
Gavin Mischler
N. Mesgarani
VLM
DiffM
303
206
0
13 Jun 2023
HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion Models
Neural Networks (Neural Netw.), 2023
Ji-Sang Hwang
Sang-Hoon Lee
Seong-Whan Lee
DiffM
149
20
0
12 Jun 2023
High-Fidelity Audio Compression with Improved RVQGAN
Neural Information Processing Systems (NeurIPS), 2023
Rithesh Kumar
Prem Seetharaman
Alejandro Luebs
I. Kumar
Kundan Kumar
292
561
0
11 Jun 2023
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
International Conference on Learning Representations (ICLR), 2023
Hubert Siuzdak
393
184
0
01 Jun 2023
Efficient Neural Music Generation
Neural Information Processing Systems (NeurIPS), 2023
Max W. Y. Lam
Qiao Tian
Tang-Chun Li
Zongyu Yin
Siyuan Feng
...
Mingbo Ma
Xuchen Song
Jitong Chen
Yuping Wang
Yuxuan Wang
DiffM
MGen
243
82
0
25 May 2023
FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder With Multiple STFTs
Interspeech (Interspeech), 2023
Won Jang
D. Lim
Heayoung Park
198
1
0
18 May 2023
FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model
Rui Xue
Yanqing Liu
Lei He
Xuejiao Tan
Linquan Liu
Ed Lin
Sheng Zhao
310
8
0
06 Mar 2023
On the Audio-visual Synchronization for Lip-to-Speech Synthesis
IEEE International Conference on Computer Vision (ICCV), 2023
Zhe Niu
Brian Mak
168
4
0
01 Mar 2023
Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator
Interspeech (Interspeech), 2023
Vladimir Bataev
Roman Korostik
Evgeny Shabalin
Vitaly Lavrukhin
Boris Ginsburg
VLM
221
18
0
27 Feb 2023
ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech
Ze Chen
Yihan Wu
Yichong Leng
Jiawei Chen
Haohe Liu
...
Ke Wang
Lei He
Sheng Zhao
Jiang Bian
Danilo Mandic
DiffM
219
26
0
30 Dec 2022
Framewise WaveGAN: High Speed Adversarial Vocoder in Time Domain with Very Low Computational Complexity
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Ahmed Mustafa
J. Valin
Jan Büthe
Paris Smaragdis
Mike Goodwin
134
6
0
08 Dec 2022
VISinger 2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer
Interspeech (Interspeech), 2022
Yongmao Zhang
Heyang Xue
Hanzhao Li
Linfu Xie
Tingwei Guo
Ruixiong Zhang
Caixia Gong
DiffM
VLM
245
41
0
05 Nov 2022
Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS
International Symposium on Chinese Spoken Language Processing (ISCSLP), 2022
Kun Song
Jian Cong
Xinsheng Wang
Yongmao Zhang
Linfu Xie
Ning Jiang
Haiying Wu
139
0
0
31 Oct 2022
Nonparallel High-Quality Audio Super Resolution with Domain Adaptation and Resampling CycleGANs
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Reo Yoneyama
Ryuichi Yamamoto
Kentaro Tachibana
166
9
0
28 Oct 2022
Source-Filter HiFi-GAN: Fast and Pitch Controllable High-Fidelity Neural Vocoder
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Reo Yoneyama
Yi-Chiao Wu
Tomoki Toda
257
35
0
27 Oct 2022
Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations
Haohan Guo
Fenglong Xie
Xixin Wu
Hui Lu
Helen Meng
723
3
0
27 Oct 2022
HiFi-WaveGAN: Generative Adversarial Network with Auxiliary Spectrogram-Phase Loss for High-Fidelity Singing Voice Generation
International Symposium on Neural Networks (ISNN), 2022
Chunhui Wang
Chang Zeng
Jun Chen
Xingji He
257
7
0
23 Oct 2022
WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point Iteration
Spoken Language Technology Workshop (SLT), 2022
Yuma Koizumi
Kohei Yatabe
Heiga Zen
M. Bacchiani
DiffM
220
33
0
03 Oct 2022
A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS
Interspeech (Interspeech), 2022
Haohan Guo
Fenglong Xie
Frank Soong
Xixin Wu
Helen M. Meng
153
15
0
22 Sep 2022
Music Separation Enhancement with Generative Modeling
International Society for Music Information Retrieval Conference (ISMIR), 2022
N. Schaffer
Boaz Cogan
Ethan Manilow
Max Morrison
Prem Seetharaman
Bryan Pardo
213
11
0
26 Aug 2022
Avocodo: Generative Adversarial Network for Artifact-free Vocoder
AAAI Conference on Artificial Intelligence (AAAI), 2022
Taejun Bak
Junmo Lee
Hanbin Bae
Jinhyeok Yang
Jaesung Bae
Young-Sun Joo
260
41
0
27 Jun 2022
WOLONet: Wave Outlooker for Efficient and High Fidelity Speech Synthesis
Yi Wang
Yi Si
89
0
0
20 Jun 2022
BigVGAN: A Universal Neural Vocoder with Large-Scale Training
International Conference on Learning Representations (ICLR), 2022
Sang-gil Lee
Ming-Yu Liu
Boris Ginsburg
Bryan Catanzaro
Sung-Hoon Yoon
307
379
0
09 Jun 2022
End-to-End Zero-Shot Voice Conversion with Location-Variable Convolutions
Interspeech (Interspeech), 2022
Wonjune Kang
M. Hasegawa-Johnson
D. Roy
242
10
0
19 May 2022
FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis
International Joint Conference on Artificial Intelligence (IJCAI), 2022
Rongjie Huang
Max W. Y. Lam
Jun Wang
Jane Polak Scowcroft
Dong Yu
Yi Ren
Zhou Zhao
DiffM
149
208
0
21 Apr 2022
Adversarial Learning of Intermediate Acoustic Feature for End-to-End Lightweight Text-to-Speech
Interspeech (Interspeech), 2022
Hyungchan Yoon
Seyun Um
Changwhan Kim
Hong-Goo Kang
160
0
0
05 Apr 2022
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech
Interspeech (Interspeech), 2022
D. Lim
Sunghee Jung
Eesung Kim
358
63
0
31 Mar 2022
Phase-Aware Spoof Speech Detection Based on Res2Net with Phase Network
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Juntae Kim
S. Ban
218
23
0
21 Mar 2022
A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS
Interspeech (Interspeech), 2022
Haohan Guo
Hui Lu
Xixin Wu
Helen Meng
819
9
0
02 Mar 2022
RefineGAN: Universally Generating Waveform Better than Ground Truth with Highly Accurate Pitch and Intensity Responses
Interspeech (Interspeech), 2021
Shengyuan Xu
Wenxiao Zhao
Jing Guo
239
14
0
01 Nov 2021
FlowVocoder: A small Footprint Neural Vocoder based Normalizing flow for Speech Synthesis
Interspeech (Interspeech), 2021
Manh Luong
Viet-Anh Tran
103
3
0
27 Sep 2021
Previous
1
2