v1v2 (latest)

SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping

Interspeech (Interspeech), 2022

31 March 2022

ArXiv (abs)PDF HTML Github

Papers citing "SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping"

40 / 40 papers shown

GLA-Grad++: An Improved Griffin-Lim Guided Diffusion Model for Speech Synthesis

144

27 Nov 2025

An Octave-based Multi-Resolution CQT Architecture for Diffusion-based Audio Generation

Maurício do V. M. da Costa

Eloi Moliner

DiffM

230

20 Sep 2025

Audio Generation Through Score-Based Generative Modeling: Design Principles and Implementation

326

10 Jun 2025

Source Separation by Flow Matching

550

22 May 2025

WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow MatchingNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025

275

20 Mar 2025

UniWav: Towards Unified Pre-training for Speech Representation Learning and GenerationInternational Conference on Learning Representations (ICLR), 2025

436

02 Mar 2025

RestoreGrad: Signal Restoration Using Conditional Denoising Diffusion Models with Jointly Learned Prior

Ching Hua Lee

Chouchang Yang

Jaejin Cho

Yashas Malur Saidutta

640

19 Feb 2025

Wavehax: Aliasing-Free Neural Waveform Synthesis Based on 2D Convolution and Harmonic Prior for Reliable Complex Spectrogram EstimationIEEE Transactions on Audio, Speech, and Language Processing (IEEE TASLP), 2024

353

11 Nov 2024

SF-Speech: Straightened Flow for Zero-Shot Voice CloneIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2024

601

16 Oct 2024

Facial Expression-Enhanced TTS: Combining Face Representation and Emotion Intensity for Adaptive Speech

Yunji Chu

Yunseob Shim

Unsang Park

266

24 Sep 2024

DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-SpeechIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Tao Wang

...

Xiaopeng Wang

Yuankun Xie

Yukun Liu

Zhengqi Wen

Guanjun Li

DiffM

351

18 Sep 2024

Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization

Sang-Hoon Lee

Ha-Yeong Choi

Seong-Whan Lee

AI4TS

362

15 Aug 2024

PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform GenerationInternational Conference on Learning Representations (ICLR), 2024

Sang-Hoon Lee

Ha-Yeong Choi

Seong-Whan Lee

OOD DiffM AI4TS

395

14 Aug 2024

FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter

Yuanjun Lv

Hai Li

Ying Yan

Junhui Liu

Danming Xie

Lei Xie

269

12 Jun 2024

Detecting Out-Of-Distribution Earth Observation Images with Diffusion Models

Georges Le Bellier

Nicolas Audebert

325

19 Apr 2024

RFWave: Multi-band Rectified Flow for Audio Waveform ReconstructionInternational Conference on Learning Representations (ICLR), 2024

Peng Liu

Dongyang Dai

Zhiyong Wu

572

08 Mar 2024

PeriodGrad: Towards Pitch-Controllable Neural Vocoder Based on a Diffusion Probabilistic Model

268

22 Feb 2024

GLA-Grad: A Griffin-Lim Extended Waveform Generation Diffusion Model

264

09 Feb 2024

SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and Music Synthesis

280

30 Jan 2024

FreGrad: Lightweight and Fast Frequency-aware Diffusion Vocoder

Ji-Hoon Kim

Joon Son Chung

306

18 Jan 2024

Generative Pre-training for Speech with Flow MatchingInternational Conference on Learning Representations (ICLR), 2023

448

25 Oct 2023

BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial NetworkIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Takashi Shibuya

Yuhta Takida

Yuki Mitsufuji

362

06 Sep 2023

HierVST: Hierarchical Adaptive Zero-shot Voice Style TransferInterspeech (Interspeech), 2023

353

30 Jul 2023

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language ModelsNeural Information Processing Systems (NeurIPS), 2023

Cong Han

369

241

13 Jun 2023

DiffSketching: Sketch Control Image Synthesis with Diffusion ModelsBritish Machine Vision Conference (BMVC), 2023

341

30 May 2023

FastFit: Towards Real-Time Iterative Neural Vocoder by Replacing U-Net Encoder With Multiple STFTsInterspeech (Interspeech), 2023

Won Jang

D. Lim

Heayoung Park

250

18 May 2023

Learn to Sing by Listening: Building Controllable Virtual Singer by Unsupervised Learning from Voice Recordings

Wei Xue

Yiwen Wang

Qi-fei Liu

Yi-Ting Guo

206

09 May 2023

StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing

Jian Yang

539

28 Mar 2023

A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI

Mengchun Zhang

In So Kweon

339

110

23 Mar 2023

Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text RepresentationsIEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2023

318

03 Mar 2023

Imaginary Voice: Face-styled Diffusion Model for Text-to-SpeechIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Jiyoung Lee

Joon Son Chung

Soo-Whan Chung

DiffM

250

27 Feb 2023

ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech

...

Jiang Bian

315

30 Dec 2022

Framewise WaveGAN: High Speed Adversarial Vocoder in Time Domain with Very Low Computational ComplexityIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

202

08 Dec 2022

HouseDiffusion: Vector Floorplan Generation via a Diffusion Model with Discrete and Continuous DenoisingComputer Vision and Pattern Recognition (CVPR), 2022

M. Shabani

Sepidehsadat Hosseini

Yasutaka Furukawa

DiffM

306

121

23 Nov 2022

Diffusion-based Generative Speech Source SeparationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

453

31 Oct 2022

Robust One-Shot Singing Voice Conversion

312

20 Oct 2022

Hierarchical Diffusion Models for Singing Voice Neural VocoderIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

357

14 Oct 2022

WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point IterationSpoken Language Technology Workshop (SLT), 2022

255

03 Oct 2022

A Survey on Generative Diffusion ModelIEEE Transactions on Knowledge and Data Engineering (TKDE), 2022

Hanqun Cao

Cheng Tan

Zhangyang Gao

Yilun Xu

Guangyong Chen

Pheng-Ann Heng

Stan Z. Li

MedIm

1.1K

485

06 Sep 2022

Speech Enhancement and Dereverberation with Diffusion-based Generative ModelsIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2022

507

354

11 Aug 2022