ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.16749
  4. Cited By
SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with
  Adaptive Noise Spectral Shaping
v1v2 (latest)

SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping

Interspeech (Interspeech), 2022
31 March 2022
Yuma Koizumi
Heiga Zen
Kohei Yatabe
Nanxin Chen
M. Bacchiani
    DiffM
ArXiv (abs)PDFHTMLGithub

Papers citing "SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping"

40 / 40 papers shown
GLA-Grad++: An Improved Griffin-Lim Guided Diffusion Model for Speech Synthesis
GLA-Grad++: An Improved Griffin-Lim Guided Diffusion Model for Speech Synthesis
Teysir Baoueb
Xiaoyu Bie
Mathieu Fontaine
Gaël Richard
DiffM
144
0
0
27 Nov 2025
An Octave-based Multi-Resolution CQT Architecture for Diffusion-based Audio Generation
An Octave-based Multi-Resolution CQT Architecture for Diffusion-based Audio Generation
Maurício do V. M. da Costa
Eloi Moliner
DiffM
230
1
0
20 Sep 2025
Audio Generation Through Score-Based Generative Modeling: Design Principles and Implementation
Ge Zhu
Yutong Wen
Zhiyao Duan
DiffMMedIm
326
3
0
10 Jun 2025
Source Separation by Flow Matching
Source Separation by Flow Matching
Robin Scheibler
John R. Hershey
Arnaud Doucet
Henry Li
550
4
0
22 May 2025
WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching
WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow MatchingNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Tianze Luo
Xingchen Miao
Wenbo Duan
DiffM
275
10
0
20 Mar 2025
UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation
UniWav: Towards Unified Pre-training for Speech Representation Learning and GenerationInternational Conference on Learning Representations (ICLR), 2025
Alexander H. Liu
Sang-gil Lee
Chao-Han Huck Yang
Yuan Gong
Yu-Chun Wang
James Glass
Rafael Valle
Bryan Catanzaro
SSL
436
5
0
02 Mar 2025
RestoreGrad: Signal Restoration Using Conditional Denoising Diffusion Models with Jointly Learned Prior
RestoreGrad: Signal Restoration Using Conditional Denoising Diffusion Models with Jointly Learned Prior
Ching Hua Lee
Chouchang Yang
Jaejin Cho
Yashas Malur Saidutta
R. S. Srinivasa
Yilin Shen
Hongxia Jin
DiffM
640
2
0
19 Feb 2025
Wavehax: Aliasing-Free Neural Waveform Synthesis Based on 2D Convolution and Harmonic Prior for Reliable Complex Spectrogram Estimation
Wavehax: Aliasing-Free Neural Waveform Synthesis Based on 2D Convolution and Harmonic Prior for Reliable Complex Spectrogram EstimationIEEE Transactions on Audio, Speech, and Language Processing (IEEE TASLP), 2024
Reo Yoneyama
Atsushi Miyashita
Ryuichi Yamamoto
Tomoki Toda
353
5
0
11 Nov 2024
SF-Speech: Straightened Flow for Zero-Shot Voice Clone
SF-Speech: Straightened Flow for Zero-Shot Voice CloneIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2024
Xuyuan Li
Zengqiang Shang
Hua Hua
Peiyang Shi
Chen Yang
Li Wang
Pengyuan Zhang
601
5
0
16 Oct 2024
Facial Expression-Enhanced TTS: Combining Face Representation and
  Emotion Intensity for Adaptive Speech
Facial Expression-Enhanced TTS: Combining Face Representation and Emotion Intensity for Adaptive Speech
Yunji Chu
Yunseob Shim
Unsang Park
266
2
0
24 Sep 2024
DPI-TTS: Directional Patch Interaction for Fast-Converging and Style
  Temporal Modeling in Text-to-Speech
DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-SpeechIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Xin Qi
Ruibo Fu
Zhengqi Wen
Tao Wang
Chunyu Qiang
...
Xiaopeng Wang
Yuankun Xie
Yukun Liu
Zhengqi Wen
Guanjun Li
DiffM
351
1
0
18 Sep 2024
Accelerating High-Fidelity Waveform Generation via Adversarial Flow
  Matching Optimization
Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization
Sang-Hoon Lee
Ha-Yeong Choi
Seong-Whan Lee
AI4TS
362
7
0
15 Aug 2024
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform
  Generation
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform GenerationInternational Conference on Learning Representations (ICLR), 2024
Sang-Hoon Lee
Ha-Yeong Choi
Seong-Whan Lee
OODDiffMAI4TS
395
17
0
14 Aug 2024
FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter
FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter
Yuanjun Lv
Hai Li
Ying Yan
Junhui Liu
Danming Xie
Lei Xie
269
12
0
12 Jun 2024
Detecting Out-Of-Distribution Earth Observation Images with Diffusion
  Models
Detecting Out-Of-Distribution Earth Observation Images with Diffusion Models
Georges Le Bellier
Nicolas Audebert
325
13
0
19 Apr 2024
RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction
RFWave: Multi-band Rectified Flow for Audio Waveform ReconstructionInternational Conference on Learning Representations (ICLR), 2024
Peng Liu
Dongyang Dai
Zhiyong Wu
572
15
0
08 Mar 2024
PeriodGrad: Towards Pitch-Controllable Neural Vocoder Based on a
  Diffusion Probabilistic Model
PeriodGrad: Towards Pitch-Controllable Neural Vocoder Based on a Diffusion Probabilistic Model
Yukiya Hono
Kei Hashimoto
Yoshihiko Nankaku
Keiichi Tokuda
DiffM
268
8
0
22 Feb 2024
GLA-Grad: A Griffin-Lim Extended Waveform Generation Diffusion Model
GLA-Grad: A Griffin-Lim Extended Waveform Generation Diffusion Model
Haocheng Liu
Teysir Baoueb
Mathieu Fontaine
Jonathan Le Roux
Gaël Richard
264
10
0
09 Feb 2024
SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and
  Music Synthesis
SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and Music Synthesis
Teysir Baoueb
Haocheng Liu
Mathieu Fontaine
Jonathan Le Roux
Gaël Richard
DiffM
280
9
0
30 Jan 2024
FreGrad: Lightweight and Fast Frequency-aware Diffusion Vocoder
FreGrad: Lightweight and Fast Frequency-aware Diffusion Vocoder
Tan Dat Nguyen
Ji-Hoon Kim
Youngjoon Jang
Jaehun Kim
Joon Son Chung
DiffM
306
21
0
18 Jan 2024
Generative Pre-training for Speech with Flow Matching
Generative Pre-training for Speech with Flow MatchingInternational Conference on Learning Representations (ICLR), 2023
Alexander H. Liu
Matt Le
Apoorv Vyas
Bowen Shi
Andros Tjandra
Wei-Ning Hsu
448
65
0
25 Oct 2023
BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial
  Network
BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial NetworkIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Takashi Shibuya
Yuhta Takida
Yuki Mitsufuji
362
19
0
06 Sep 2023
HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer
HierVST: Hierarchical Adaptive Zero-shot Voice Style TransferInterspeech (Interspeech), 2023
Sang-Hoon Lee
Haram Choi
H. Oh
Seong-Whan Lee
BDL
353
15
0
30 Jul 2023
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion
  and Adversarial Training with Large Speech Language Models
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language ModelsNeural Information Processing Systems (NeurIPS), 2023
Yinghao Aaron Li
Cong Han
Vinay S. Raghavan
Gavin Mischler
N. Mesgarani
VLMDiffM
369
241
0
13 Jun 2023
DiffSketching: Sketch Control Image Synthesis with Diffusion Models
DiffSketching: Sketch Control Image Synthesis with Diffusion ModelsBritish Machine Vision Conference (BMVC), 2023
Qiang Wang
Di Kong
Fengyin Lin
Yonggang Qi
DiffM
341
26
0
30 May 2023
FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net
  Encoder With Multiple STFTs
FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder With Multiple STFTsInterspeech (Interspeech), 2023
Won Jang
D. Lim
Heayoung Park
250
1
0
18 May 2023
Learn to Sing by Listening: Building Controllable Virtual Singer by
  Unsupervised Learning from Voice Recordings
Learn to Sing by Listening: Building Controllable Virtual Singer by Unsupervised Learning from Voice Recordings
Wei Xue
Yiwen Wang
Qi-fei Liu
Yi-Ting Guo
206
1
0
09 May 2023
StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing
StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing
Senmao Li
Joost van de Weijer
Taihang Hu
Fahad Shahbaz Khan
Qibin Hou
Yaxing Wang
Jian Yang
DiffM
539
79
0
28 Mar 2023
A Survey on Audio Diffusion Models: Text To Speech Synthesis and
  Enhancement in Generative AI
A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI
Chenshuang Zhang
Chaoning Zhang
Sheng Zheng
Mengchun Zhang
Maryam Qamar
Sung-Ho Bae
In So Kweon
DiffMMedIm
339
110
0
23 Mar 2023
Miipher: A Robust Speech Restoration Model Integrating Self-Supervised
  Speech and Text Representations
Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text RepresentationsIEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2023
Yuma Koizumi
Heiga Zen
Shigeki Karita
Yifan Ding
Kohei Yatabe
Nobuyuki Morioka
Yu Zhang
Wei Han
Ankur Bapna
M. Bacchiani
318
48
0
03 Mar 2023
Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech
Imaginary Voice: Face-styled Diffusion Model for Text-to-SpeechIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Jiyoung Lee
Joon Son Chung
Soo-Whan Chung
DiffM
250
47
0
27 Feb 2023
ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to
  Speech
ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech
Ze Chen
Yihan Wu
Yichong Leng
Jiawei Chen
Haohe Liu
...
Ke Wang
Lei He
Sheng Zhao
Jiang Bian
Danilo Mandic
DiffM
315
26
0
30 Dec 2022
Framewise WaveGAN: High Speed Adversarial Vocoder in Time Domain with
  Very Low Computational Complexity
Framewise WaveGAN: High Speed Adversarial Vocoder in Time Domain with Very Low Computational ComplexityIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Ahmed Mustafa
J. Valin
Jan Büthe
Paris Smaragdis
Mike Goodwin
202
8
0
08 Dec 2022
HouseDiffusion: Vector Floorplan Generation via a Diffusion Model with
  Discrete and Continuous Denoising
HouseDiffusion: Vector Floorplan Generation via a Diffusion Model with Discrete and Continuous DenoisingComputer Vision and Pattern Recognition (CVPR), 2022
M. Shabani
Sepidehsadat Hosseini
Yasutaka Furukawa
DiffM
306
121
0
23 Nov 2022
Diffusion-based Generative Speech Source Separation
Diffusion-based Generative Speech Source SeparationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Robin Scheibler
Youna Ji
Soo-Whan Chung
J. Byun
Soyeon Choe
Min-Seok Choi
DiffM
453
67
0
31 Oct 2022
Robust One-Shot Singing Voice Conversion
Robust One-Shot Singing Voice Conversion
Naoya Takahashi
M. Singh
Yuki Mitsufuji
DiffM
312
9
0
20 Oct 2022
Hierarchical Diffusion Models for Singing Voice Neural Vocoder
Hierarchical Diffusion Models for Singing Voice Neural VocoderIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Naoya Takahashi
Mayank Kumar
Singh
Yuki Mitsufuji
DiffM
357
19
0
14 Oct 2022
WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on
  Fixed-Point Iteration
WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point IterationSpoken Language Technology Workshop (SLT), 2022
Yuma Koizumi
Kohei Yatabe
Heiga Zen
M. Bacchiani
DiffM
255
36
0
03 Oct 2022
A Survey on Generative Diffusion Model
A Survey on Generative Diffusion ModelIEEE Transactions on Knowledge and Data Engineering (TKDE), 2022
Hanqun Cao
Cheng Tan
Zhangyang Gao
Yilun Xu
Guangyong Chen
Pheng-Ann Heng
Stan Z. Li
MedIm
1.1K
485
0
06 Sep 2022
Speech Enhancement and Dereverberation with Diffusion-based Generative Models
Speech Enhancement and Dereverberation with Diffusion-based Generative ModelsIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Julius Richter
Simon Welker
Jean-Marie Lemercier
Bunlong Lay
Timo Gerkmann
DiffM
507
354
0
11 Aug 2022
1
Page 1 of 1