ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.07889
  4. Cited By
UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram
  Discriminators for High-Fidelity Waveform Generation

UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

Interspeech (Interspeech), 2021
15 June 2021
Won Jang
D. Lim
Jaesam Yoon
Bongwan Kim
Juntae Kim
ArXiv (abs)PDFHTML

Papers citing "UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation"

50 / 94 papers shown
UniTok-Audio: A Unified Audio Generation Framework via Generative Modeling on Discrete Codec Tokens
UniTok-Audio: A Unified Audio Generation Framework via Generative Modeling on Discrete Codec Tokens
Chengwei Liu
Haoyin Yan
Shaofei Xue
Xiaotao Liang
Yinghao Liu
Zheng Xue
Gang Song
Boyang Zhou
235
2
0
30 Oct 2025
SynthVC: Leveraging Synthetic Data for End-to-End Low Latency Streaming Voice Conversion
SynthVC: Leveraging Synthetic Data for End-to-End Low Latency Streaming Voice Conversion
Zhao Guo
Ziqian Ning
Guobin Ma
Lei Xie
SyDa
93
0
0
10 Oct 2025
Beyond Static Knowledge Messengers: Towards Adaptive, Fair, and Scalable Federated Learning for Medical AI
Beyond Static Knowledge Messengers: Towards Adaptive, Fair, and Scalable Federated Learning for Medical AI
Jahidul Arafat
Fariha Tasmin
Sanjaya Poudel
Ahsan Habib Tareq
FedML
218
0
0
05 Oct 2025
NLDSI-BWE: Non Linear Dynamical Systems-Inspired Multi Resolution Discriminators for Speech Bandwidth Extension
NLDSI-BWE: Non Linear Dynamical Systems-Inspired Multi Resolution Discriminators for Speech Bandwidth Extension
Tarikul Islam Tamiti
Anomadarshi Barua
126
0
0
01 Oct 2025
AUDETER: A Large-scale Dataset for Deepfake Audio Detection in Open Worlds
AUDETER: A Large-scale Dataset for Deepfake Audio Detection in Open Worlds
Qizhou Wang
Hanxun Huang
Guansong Pang
S. Erfani
Christopher Leckie
117
0
0
04 Sep 2025
FasterVoiceGrad: Faster One-step Diffusion-Based Voice Conversion with Adversarial Diffusion Conversion Distillation
FasterVoiceGrad: Faster One-step Diffusion-Based Voice Conversion with Adversarial Diffusion Conversion Distillation
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Yuto Kondo
77
0
0
25 Aug 2025
Vocoder-Projected Feature Discriminator
Vocoder-Projected Feature Discriminator
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Yuto Kondo
DiffM
140
0
0
25 Aug 2025
Unseen Speaker and Language Adaptation for Lightweight Text-To-Speech with Adapters
Unseen Speaker and Language Adaptation for Lightweight Text-To-Speech with Adapters
Alessio Falai
Ziyao Zhang
Akos Gangoly
108
0
0
25 Aug 2025
MahaTTS: A Unified Framework for Multilingual Text-to-Speech Synthesis
MahaTTS: A Unified Framework for Multilingual Text-to-Speech Synthesis
Jaskaran Singh
Amartya Roy Chowdhury
Raghav Prabhakar
Varshul C. W
91
0
0
05 Aug 2025
Enhancing Spectrogram Realism in Singing Voice Synthesis via Explicit Bandwidth Extension Prior to Vocoder
Enhancing Spectrogram Realism in Singing Voice Synthesis via Explicit Bandwidth Extension Prior to Vocoder
Runxuan Yang
Kai Li
Guo Chen
Xiaolin Hu
113
0
0
03 Aug 2025
Learning Neural Vocoder from Range-Null Space Decomposition
Learning Neural Vocoder from Range-Null Space DecompositionInternational Joint Conference on Artificial Intelligence (IJCAI), 2025
Andong Li
Tong Lei
Zhihang Sun
Rilin Chen
Erwei Yin
Xiaodong Li
C. Zheng
163
2
0
28 Jul 2025
Nonlinear Framework for Speech Bandwidth Extension
Nonlinear Framework for Speech Bandwidth Extension
Tarikul Islam Tamiti
Nursad Mamun
Anomadarshi Barua
175
0
0
21 Jul 2025
Towards Bitrate-Efficient and Noise-Robust Speech Coding with Variable Bitrate RVQ
Towards Bitrate-Efficient and Noise-Robust Speech Coding with Variable Bitrate RVQ
Yunkee Chae
Kyogu Lee
134
2
0
19 Jun 2025
BemaGANv2: A Tutorial and Comparative Survey of GAN-based Vocoders for Long-Term Audio Generation
BemaGANv2: A Tutorial and Comparative Survey of GAN-based Vocoders for Long-Term Audio Generation
Taesoo Park
Mungwi Jeong
Mingyu Park
Narae Kim
Junyoung Kim
Mujung Kim
Jisang Yoo
Hoyun Lee
Sanghoon Kim
Soonchul Kwon
180
0
0
11 Jun 2025
SpINRv2: Implicit Neural Representation for Passband FMCW Radars
Harshvardhan Takawale
Nirupam Roy
195
0
0
09 Jun 2025
Comparative Analysis of Fast and High-Fidelity Neural Vocoders for Low-Latency Streaming Synthesis in Resource-Constrained Environments
Comparative Analysis of Fast and High-Fidelity Neural Vocoders for Low-Latency Streaming Synthesis in Resource-Constrained Environments
Reo Yoneyama
Masaya Kawamura
Ryo Terashima
Ryuichi Yamamoto
Tomoki Toda
260
0
0
04 Jun 2025
SwitchCodec: A High-Fidelity Nerual Audio Codec With Sparse Quantization
SwitchCodec: A High-Fidelity Nerual Audio Codec With Sparse Quantization
Jin Wang
Wenbin Jiang
Xiangbo Wang
Yubo You
Sheng Fang
254
0
0
30 May 2025
DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis
DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio SynthesisIEEE Access (IEEE Access), 2025
Zeeshan Ahmad
Shudi Bao
Meng Chen
219
1
0
14 May 2025
L3AC: Towards a Lightweight and Lossless Audio Codec
L3AC: Towards a Lightweight and Lossless Audio Codec
Linwei Zhai
Yunpeng Song
Cui Zhao
Haiwei Yang
Ge Wang
Wang Zhi
Wei Xi
MQ
302
1
0
07 Apr 2025
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
Shuyu Li
Shulei Ji
Zihao Wang
Songruoyao Wu
Jiaxing Yu
Jianchao Tan
MGenVGen
563
3
0
01 Apr 2025
SupertonicTTS: Towards Highly Efficient and Streamlined Text-to-Speech System
SupertonicTTS: Towards Highly Efficient and Streamlined Text-to-Speech System
Hyeongju Kim
Jinhyeok Yang
Yechan Yu
Seunghun Ji
Jacob Morton
Frederik Bous
Joon Byun
Juheon Lee
431
1
0
29 Mar 2025
WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching
WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow MatchingNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Tianze Luo
Xingchen Miao
Wenbo Duan
DiffM
235
6
0
20 Mar 2025
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
Ziyue Jiang
Yi Ren
Ruiqi Li
Shengpeng Ji
Zhenhui Ye
...
Yanzhe Zhang
Rui Liu
Xiang Yin
Zhou Zhao
Zhou Zhao
513
0
0
26 Feb 2025
High-Fidelity Music Vocoder using Neural Audio Codecs
High-Fidelity Music Vocoder using Neural Audio CodecsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Luca A. Lanzendörfer
Florian Grötschla
Michael Ungersböck
Roger Wattenhofer
299
2
0
18 Feb 2025
FlashSR: One-step Versatile Audio Super-resolution via Diffusion Distillation
FlashSR: One-step Versatile Audio Super-resolution via Diffusion DistillationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Jaekwon Im
Juhan Nam
DiffM
327
4
0
18 Jan 2025
KALL-E:Autoregressive Speech Synthesis with Next-Distribution Prediction
KALL-E:Autoregressive Speech Synthesis with Next-Distribution Prediction
Kangxiang Xia
Xinfa Zhu
Lei Xie
WenJie Tian
W. Li
Lei Xie
VLM
425
8
0
22 Dec 2024
ESTVocoder: An Excitation-Spectral-Transformed Neural Vocoder Conditioned on Mel Spectrogram
Xiao-Hang Jiang
Hui-Peng Du
Yang Ai
Ye-Xin Lu
Zhen-Hua Ling
210
0
0
18 Nov 2024
Wavehax: Aliasing-Free Neural Waveform Synthesis Based on 2D Convolution and Harmonic Prior for Reliable Complex Spectrogram Estimation
Wavehax: Aliasing-Free Neural Waveform Synthesis Based on 2D Convolution and Harmonic Prior for Reliable Complex Spectrogram Estimation
Reo Yoneyama
Atsushi Miyashita
Ryuichi Yamamoto
Tomoki Toda
268
4
0
11 Nov 2024
MDCTCodec: A Lightweight MDCT-based Neural Audio Codec towards High
  Sampling Rate and Low Bitrate Scenarios
MDCTCodec: A Lightweight MDCT-based Neural Audio Codec towards High Sampling Rate and Low Bitrate ScenariosSpoken Language Technology Workshop (SLT), 2024
Xiao-Hang Jiang
Yang Ai
Rui Zheng
Hui-Peng Du
Ye-Xin Lu
Zhen-Hua Ling
271
10
0
01 Nov 2024
APCodec+: A Spectrum-Coding-Based High-Fidelity and
  High-Compression-Rate Neural Audio Codec with Staged Training Paradigm
APCodec+: A Spectrum-Coding-Based High-Fidelity and High-Compression-Rate Neural Audio Codec with Staged Training ParadigmInternational Symposium on Chinese Spoken Language Processing (ISCSLP), 2024
Hui-Peng Du
Yang Ai
Rui Zheng
Zhen-Hua Ling
218
5
0
30 Oct 2024
SNAC: Multi-Scale Neural Audio Codec
SNAC: Multi-Scale Neural Audio Codec
Hubert Siuzdak
Florian Grötschla
Luca A. Lanzendörfer
138
45
0
18 Oct 2024
Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech
  Synthesis with Discrete Codec Modeling of EnGen-TTS
Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech Synthesis with Discrete Codec Modeling of EnGen-TTSConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Onkar Kishor Susladkar
Vishesh Tripathi
Biddwan Ahmed
126
0
0
09 Oct 2024
InstructSing: High-Fidelity Singing Voice Generation via Instructing
  Yourself
InstructSing: High-Fidelity Singing Voice Generation via Instructing YourselfSpoken Language Technology Workshop (SLT), 2024
Chang Zeng
Chunhui Wang
Xiaoxiao Miao
Jian Zhao
Zhonglin Jiang
Yong Chen
217
1
0
10 Sep 2024
FastVoiceGrad: One-step Diffusion-Based Voice Conversion with
  Adversarial Conditional Diffusion Distillation
FastVoiceGrad: One-step Diffusion-Based Voice Conversion with Adversarial Conditional Diffusion DistillationInterspeech (Interspeech), 2024
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Yuto Kondo
DiffM
241
6
0
03 Sep 2024
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language ModelingInternational Conference on Learning Representations (ICLR), 2024
Shengpeng Ji
Ziyue Jiang
Xize Cheng
Yifu Chen
Minghui Fang
...
Rongjie Huang
Yidi Jiang
Qian Chen
Zhou Zhao
Zhou Zhao
VLM
377
120
0
29 Aug 2024
Accelerating High-Fidelity Waveform Generation via Adversarial Flow
  Matching Optimization
Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization
Sang-Hoon Lee
Ha-Yeong Choi
Seong-Whan Lee
AI4TS
236
5
0
15 Aug 2024
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform
  Generation
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform GenerationInternational Conference on Learning Representations (ICLR), 2024
Sang-Hoon Lee
Ha-Yeong Choi
Seong-Whan Lee
OODDiffMAI4TS
305
13
0
14 Aug 2024
Speech Editing -- a Summary
Speech Editing -- a Summary
Tobias Kässmann
Yining Liu
Danni Liu
150
1
0
24 Jul 2024
Improving Unsupervised Clean-to-Rendered Guitar Tone Transformation
  Using GANs and Integrated Unaligned Clean Data
Improving Unsupervised Clean-to-Rendered Guitar Tone Transformation Using GANs and Integrated Unaligned Clean Data
Yu-Hua Chen
Woosung Choi
Wei-Hsiang Liao
Marco A. Martínez-Ramírez
K. Cheuk
Yuki Mitsufuji
J. Jang
Yi-Hsuan Yang
183
6
0
22 Jun 2024
Period Singer: Integrating Periodic and Aperiodic Variational
  Autoencoders for Natural-Sounding End-to-End Singing Voice Synthesis
Period Singer: Integrating Periodic and Aperiodic Variational Autoencoders for Natural-Sounding End-to-End Singing Voice SynthesisInterspeech (Interspeech), 2024
Taewoo Kim
Choongsang Cho
Young Han Lee
AI4TS
148
4
0
14 Jun 2024
VISinger2+: End-to-End Singing Voice Synthesis Augmented by
  Self-Supervised Learning Representation
VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation
Yifeng Yu
Jiatong Shi
Yuning Wu
Shinji Watanabe
214
9
0
13 Jun 2024
RaD-Net 2: A causal two-stage repairing and denoising speech enhancement
  network with knowledge distillation and complex axial self-attention
RaD-Net 2: A causal two-stage repairing and denoising speech enhancement network with knowledge distillation and complex axial self-attention
Mingshuai Liu
Zhuangqi Chen
Xiaopeng Yan
Yuanjun Lv
Xianjun Xia
Chuanzeng Huang
Yijian Xiao
Lei Xie
170
8
0
11 Jun 2024
JenGAN: Stacked Shifted Filters in GAN-Based Speech Synthesis
JenGAN: Stacked Shifted Filters in GAN-Based Speech SynthesisInterspeech (Interspeech), 2024
Hyunjae Cho
Junhyeok Lee
Wonbin Jung
197
3
0
10 Jun 2024
BiVocoder: A Bidirectional Neural Vocoder Integrating Feature Extraction
  and Waveform Generation
BiVocoder: A Bidirectional Neural Vocoder Integrating Feature Extraction and Waveform Generation
Hui-Peng Du
Ye-Xin Lu
Yang Ai
Zhen-Hua Ling
117
4
0
04 Jun 2024
HILCodec: High Fidelity and Lightweight Neural Audio Codec
HILCodec: High Fidelity and Lightweight Neural Audio Codec
S. Ahn
Beom Jun Woo
Mingrui Han
Chanyeong Moon
Nam Soo Kim
261
16
0
08 May 2024
HAM-TTS: Hierarchical Acoustic Modeling for Token-Based Zero-Shot
  Text-to-Speech with Model and Data Scaling
HAM-TTS: Hierarchical Acoustic Modeling for Token-Based Zero-Shot Text-to-Speech with Model and Data Scaling
Chunhui Wang
Chang Zeng
Bowen Zhang
Ziyang Ma
Yefan Zhu
Zifeng Cai
Jian Zhao
Zhonglin Jiang
Yong Chen
SyDa
130
8
0
09 Mar 2024
Language-Codec: Bridging Discrete Codec Representations and Speech Language Models
Language-Codec: Bridging Discrete Codec Representations and Speech Language Models
Shengpeng Ji
Minghui Fang
Ziyue Jiang
Ziyue Jiang
Dingdong Wang
Hanting Wang
Jialung Zuo
Shulei Wang
AuLLM
338
16
0
19 Feb 2024
APCodec: A Neural Audio Codec with Parallel Amplitude and Phase Spectrum
  Encoding and Decoding
APCodec: A Neural Audio Codec with Parallel Amplitude and Phase Spectrum Encoding and Decoding
Yang Ai
Xiao-Hang Jiang
Ye-Xin Lu
Hui-Peng Du
Zhenhua Ling
183
42
0
16 Feb 2024
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model
  on 100K hours of data
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
Mateusz Lajszczak
Guillermo Cámbara
Yang Li
Fatih Beyhan
Arent van Korlaar
...
Bartosz Putrycz
Soledad López Gambino
Kayeon Yoo
Elena Sokolova
Thomas Drugman
LM&MA
366
112
0
12 Feb 2024
EVA-GAN: Enhanced Various Audio Generation via Scalable Generative
  Adversarial Networks
EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial Networks
Shijia Liao
Shiyi Lan
Arun George Zachariah
125
3
0
31 Jan 2024
12
Next