Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2203.16749
Cited By
v1
v2 (latest)
SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping
Interspeech (Interspeech), 2022
31 March 2022
Yuma Koizumi
Heiga Zen
Kohei Yatabe
Nanxin Chen
M. Bacchiani
DiffM
Re-assign community
ArXiv (abs)
PDF
HTML
Github
Papers citing
"SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping"
40 / 40 papers shown
GLA-Grad++: An Improved Griffin-Lim Guided Diffusion Model for Speech Synthesis
Teysir Baoueb
Xiaoyu Bie
Mathieu Fontaine
Gaël Richard
DiffM
144
0
0
27 Nov 2025
An Octave-based Multi-Resolution CQT Architecture for Diffusion-based Audio Generation
Maurício do V. M. da Costa
Eloi Moliner
DiffM
230
1
0
20 Sep 2025
Audio Generation Through Score-Based Generative Modeling: Design Principles and Implementation
Ge Zhu
Yutong Wen
Zhiyao Duan
DiffM
MedIm
326
3
0
10 Jun 2025
Source Separation by Flow Matching
Robin Scheibler
John R. Hershey
Arnaud Doucet
Henry Li
550
4
0
22 May 2025
WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching
North American Chapter of the Association for Computational Linguistics (NAACL), 2025
Tianze Luo
Xingchen Miao
Wenbo Duan
DiffM
275
10
0
20 Mar 2025
UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation
International Conference on Learning Representations (ICLR), 2025
Alexander H. Liu
Sang-gil Lee
Chao-Han Huck Yang
Yuan Gong
Yu-Chun Wang
James Glass
Rafael Valle
Bryan Catanzaro
SSL
436
5
0
02 Mar 2025
RestoreGrad: Signal Restoration Using Conditional Denoising Diffusion Models with Jointly Learned Prior
Ching Hua Lee
Chouchang Yang
Jaejin Cho
Yashas Malur Saidutta
R. S. Srinivasa
Yilin Shen
Hongxia Jin
DiffM
640
2
0
19 Feb 2025
Wavehax: Aliasing-Free Neural Waveform Synthesis Based on 2D Convolution and Harmonic Prior for Reliable Complex Spectrogram Estimation
IEEE Transactions on Audio, Speech, and Language Processing (IEEE TASLP), 2024
Reo Yoneyama
Atsushi Miyashita
Ryuichi Yamamoto
Tomoki Toda
353
5
0
11 Nov 2024
SF-Speech: Straightened Flow for Zero-Shot Voice Clone
IEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2024
Xuyuan Li
Zengqiang Shang
Hua Hua
Peiyang Shi
Chen Yang
Li Wang
Pengyuan Zhang
601
5
0
16 Oct 2024
Facial Expression-Enhanced TTS: Combining Face Representation and Emotion Intensity for Adaptive Speech
Yunji Chu
Yunseob Shim
Unsang Park
266
2
0
24 Sep 2024
DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Xin Qi
Ruibo Fu
Zhengqi Wen
Tao Wang
Chunyu Qiang
...
Xiaopeng Wang
Yuankun Xie
Yukun Liu
Zhengqi Wen
Guanjun Li
DiffM
351
1
0
18 Sep 2024
Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization
Sang-Hoon Lee
Ha-Yeong Choi
Seong-Whan Lee
AI4TS
362
7
0
15 Aug 2024
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation
International Conference on Learning Representations (ICLR), 2024
Sang-Hoon Lee
Ha-Yeong Choi
Seong-Whan Lee
OOD
DiffM
AI4TS
395
17
0
14 Aug 2024
FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter
Yuanjun Lv
Hai Li
Ying Yan
Junhui Liu
Danming Xie
Lei Xie
269
12
0
12 Jun 2024
Detecting Out-Of-Distribution Earth Observation Images with Diffusion Models
Georges Le Bellier
Nicolas Audebert
325
13
0
19 Apr 2024
RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction
International Conference on Learning Representations (ICLR), 2024
Peng Liu
Dongyang Dai
Zhiyong Wu
572
15
0
08 Mar 2024
PeriodGrad: Towards Pitch-Controllable Neural Vocoder Based on a Diffusion Probabilistic Model
Yukiya Hono
Kei Hashimoto
Yoshihiko Nankaku
Keiichi Tokuda
DiffM
268
8
0
22 Feb 2024
GLA-Grad: A Griffin-Lim Extended Waveform Generation Diffusion Model
Haocheng Liu
Teysir Baoueb
Mathieu Fontaine
Jonathan Le Roux
Gaël Richard
264
10
0
09 Feb 2024
SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and Music Synthesis
Teysir Baoueb
Haocheng Liu
Mathieu Fontaine
Jonathan Le Roux
Gaël Richard
DiffM
280
9
0
30 Jan 2024
FreGrad: Lightweight and Fast Frequency-aware Diffusion Vocoder
Tan Dat Nguyen
Ji-Hoon Kim
Youngjoon Jang
Jaehun Kim
Joon Son Chung
DiffM
306
21
0
18 Jan 2024
Generative Pre-training for Speech with Flow Matching
International Conference on Learning Representations (ICLR), 2023
Alexander H. Liu
Matt Le
Apoorv Vyas
Bowen Shi
Andros Tjandra
Wei-Ning Hsu
448
65
0
25 Oct 2023
BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Takashi Shibuya
Yuhta Takida
Yuki Mitsufuji
362
19
0
06 Sep 2023
HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer
Interspeech (Interspeech), 2023
Sang-Hoon Lee
Haram Choi
H. Oh
Seong-Whan Lee
BDL
353
15
0
30 Jul 2023
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Neural Information Processing Systems (NeurIPS), 2023
Yinghao Aaron Li
Cong Han
Vinay S. Raghavan
Gavin Mischler
N. Mesgarani
VLM
DiffM
369
241
0
13 Jun 2023
DiffSketching: Sketch Control Image Synthesis with Diffusion Models
British Machine Vision Conference (BMVC), 2023
Qiang Wang
Di Kong
Fengyin Lin
Yonggang Qi
DiffM
341
26
0
30 May 2023
FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder With Multiple STFTs
Interspeech (Interspeech), 2023
Won Jang
D. Lim
Heayoung Park
250
1
0
18 May 2023
Learn to Sing by Listening: Building Controllable Virtual Singer by Unsupervised Learning from Voice Recordings
Wei Xue
Yiwen Wang
Qi-fei Liu
Yi-Ting Guo
206
1
0
09 May 2023
StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing
Senmao Li
Joost van de Weijer
Taihang Hu
Fahad Shahbaz Khan
Qibin Hou
Yaxing Wang
Jian Yang
DiffM
539
79
0
28 Mar 2023
A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI
Chenshuang Zhang
Chaoning Zhang
Sheng Zheng
Mengchun Zhang
Maryam Qamar
Sung-Ho Bae
In So Kweon
DiffM
MedIm
339
110
0
23 Mar 2023
Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2023
Yuma Koizumi
Heiga Zen
Shigeki Karita
Yifan Ding
Kohei Yatabe
Nobuyuki Morioka
Yu Zhang
Wei Han
Ankur Bapna
M. Bacchiani
318
48
0
03 Mar 2023
Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Jiyoung Lee
Joon Son Chung
Soo-Whan Chung
DiffM
250
47
0
27 Feb 2023
ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech
Ze Chen
Yihan Wu
Yichong Leng
Jiawei Chen
Haohe Liu
...
Ke Wang
Lei He
Sheng Zhao
Jiang Bian
Danilo Mandic
DiffM
315
26
0
30 Dec 2022
Framewise WaveGAN: High Speed Adversarial Vocoder in Time Domain with Very Low Computational Complexity
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Ahmed Mustafa
J. Valin
Jan Büthe
Paris Smaragdis
Mike Goodwin
202
8
0
08 Dec 2022
HouseDiffusion: Vector Floorplan Generation via a Diffusion Model with Discrete and Continuous Denoising
Computer Vision and Pattern Recognition (CVPR), 2022
M. Shabani
Sepidehsadat Hosseini
Yasutaka Furukawa
DiffM
306
121
0
23 Nov 2022
Diffusion-based Generative Speech Source Separation
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Robin Scheibler
Youna Ji
Soo-Whan Chung
J. Byun
Soyeon Choe
Min-Seok Choi
DiffM
453
67
0
31 Oct 2022
Robust One-Shot Singing Voice Conversion
Naoya Takahashi
M. Singh
Yuki Mitsufuji
DiffM
312
9
0
20 Oct 2022
Hierarchical Diffusion Models for Singing Voice Neural Vocoder
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Naoya Takahashi
Mayank Kumar
Singh
Yuki Mitsufuji
DiffM
357
19
0
14 Oct 2022
WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point Iteration
Spoken Language Technology Workshop (SLT), 2022
Yuma Koizumi
Kohei Yatabe
Heiga Zen
M. Bacchiani
DiffM
255
36
0
03 Oct 2022
A Survey on Generative Diffusion Model
IEEE Transactions on Knowledge and Data Engineering (TKDE), 2022
Hanqun Cao
Cheng Tan
Zhangyang Gao
Yilun Xu
Guangyong Chen
Pheng-Ann Heng
Stan Z. Li
MedIm
1.1K
485
0
06 Sep 2022
Speech Enhancement and Dereverberation with Diffusion-based Generative Models
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022
Julius Richter
Simon Welker
Jean-Marie Lemercier
Bunlong Lay
Timo Gerkmann
DiffM
507
354
0
11 Aug 2022
1
Page 1 of 1