v1v2 (latest)

WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

Interspeech (Interspeech), 2021

17 June 2021

Najim Dehak

Papers citing "WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis"

50 / 63 papers shown

Visually Grounded Narratives: Reducing Cognitive Burden in Researcher-Participant Interaction

165

30 Aug 2025

Fine-Tuning Text-to-Speech Diffusion Models Using Reinforcement Learning with Human Feedback

157

05 Aug 2025

Flow Matching Policy Gradients

369

28 Jul 2025

ItDPDM: Information-Theoretic Discrete Poisson Diffusion Model

Amit Kumar Singh Yadav

DiffM

771

08 May 2025

MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation

396

14 Mar 2025

Diffuse or Confuse: A Diffusion Deepfake Speech DatasetBiometrics and Electronic Signatures (BES), 2024

323

09 Oct 2024

Task-Agnostic Pre-training and Task-Guided Fine-tuning for Versatile Diffusion Planner

458

30 Sep 2024

DiffSSD: A Diffusion-Based Dataset For Speech ForensicsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Kratika Bhagtani

Amit Kumar Singh Yadav

Paolo Bestagini

Edward J. Delp

DiffM

243

19 Sep 2024

Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training

Hawraz A. Ahmad

Tarik A. Rashid

289

06 Aug 2024

WebRPG: Automatic Web Rendering Parameters Generation for Visual Presentation

245

22 Jul 2024

GVDIFF: Grounded Text-to-Video Generation with Diffusion Models

Ruixiang Li

282

02 Jul 2024

Should you use a probabilistic duration model in TTS? Probably! Especially for spontaneous speechInterspeech (Interspeech), 2024

245

08 Jun 2024

Reinforcement Learning for Fine-tuning Text-to-speech Diffusion Models

116

23 May 2024

G4G:A Generic Framework for High Fidelity Talking Face Generation with Fine-grained Intra-modal Alignment

312

28 Feb 2024

On the Semantic Latent Space of Diffusion-Based Text-to-Speech Models

Miri Varshavsky-Hassid

274

19 Feb 2024

Classification Diffusion Models: Revitalizing Density Ratio Estimation

303

15 Feb 2024

Sampler Scheduler for Diffusion Models

Zitong Cheng

DiffM

263

12 Nov 2023

Lightweight Diffusion Models with Distillation-Based Block Neural Architecture Search

270

08 Nov 2023

E3 TTS: Easy End-to-End Diffusion-based Text to SpeechAutomatic Speech Recognition & Understanding (ASRU), 2023

354

02 Nov 2023

Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent SynthesisConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Xingshan Zeng

313

09 Oct 2023

DiffAR: Denoising Diffusion Autoregressive Model for Raw Speech Waveform GenerationInternational Conference on Learning Representations (ICLR), 2023

621

02 Oct 2023

FastGraphTTS: An Ultrafast Syntax-Aware Speech Synthesis FrameworkIEEE International Conference on Tools with Artificial Intelligence (ICTAI), 2023

210

16 Sep 2023

Matcha-TTS: A fast TTS architecture with conditional flow matchingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

356

208

06 Sep 2023

DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and MandarinIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023

Jian Cong

260

02 Sep 2023

VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation

Errui Ding

Jingdong Wang

VGen

392

01 Sep 2023

DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker EmbeddingIEEE International Conference on Computer Vision (ICCV), 2023

237

15 Aug 2023

iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNNInterspeech (Interspeech), 2023

203

14 Aug 2023

DPM-OT: A New Diffusion Probabilistic Model Based on Optimal TransportIEEE International Conference on Computer Vision (ICCV), 2023

209

21 Jul 2023

The Ethical Implications of Generative Audio Models: A Systematic Literature ReviewAAAI/ACM Conference on AI, Ethics, and Society (AIES), 2023

J. Barnett

291

07 Jul 2023

Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesisSpeech Synthesis Workshop (SSW), 2023

373

15 Jun 2023

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language ModelsNeural Information Processing Systems (NeurIPS), 2023

Cong Han

360

241

13 Jun 2023

HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion ModelsNeural Networks (Neural Netw.), 2023

227

12 Jun 2023

Nested Diffusion Processes for Anytime Image GenerationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

406

30 May 2023

Negative-prompt Inversion: Fast Image Inversion for Editing with Text-guided Diffusion ModelsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

304

184

26 May 2023

SEEDS: Exponential SDE Solvers for Fast High-Quality Sampling from Diffusion ModelsNeural Information Processing Systems (NeurIPS), 2023

369

23 May 2023

Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement LearningInternational Conference on Machine Learning (ICML), 2023

Cheng Lu

Huayu Chen

Jianfei Chen

Hang Su

Chongxuan Li

Jun Zhu

DiffM OffRL

355

132

25 Apr 2023

DiffVoice: Text-to-Speech with Latent DiffusionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Zhijun Liu

Yiwei Guo

K. Yu

DiffM

219

23 Apr 2023

A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI

Mengchun Zhang

In So Kweon

337

110

23 Mar 2023

A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need?

Yu Qiao

...

Lik-Hang Lee

Yang Yang

Heng Tao Shen

In So Kweon

Choong Seon Hong

387

208

21 Mar 2023

GECCO: Geometrically-Conditioned Point Diffusion ModelsIEEE International Conference on Computer Vision (ICCV), 2023

357

10 Mar 2023

Understanding Diffusion Objectives as the ELBO with Simple Data AugmentationNeural Information Processing Systems (NeurIPS), 2023

Diederik P. Kingma

Ruiqi Gao

DiffM

927

280

01 Mar 2023

DIFUSCO: Graph-based Diffusion Solvers for Combinatorial OptimizationNeural Information Processing Systems (NeurIPS), 2023

Zhiqing Sun

Yiming Yang

DiffM

411

254

16 Feb 2023

Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization

257

26 Jan 2023

Separate And Diffuse: Using a Pretrained Diffusion Model for Improving Source Separation

Shahar Lutati

Eliya Nachmani

Lior Wolf

DiffM

267

25 Jan 2023

Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoderIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Yusuke Yasuda

Tomoki Toda

DiffM

266

16 Dec 2022

DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic ModelsMachine Intelligence Research (MIR), 2022

1.1K

940

02 Nov 2022

LION: Latent Point Diffusion Models for 3D Shape GenerationNeural Information Processing Systems (NeurIPS), 2022

Sanja Fidler

471

662

12 Oct 2022

GENIE: Higher-Order Denoising Diffusion SolversNeural Information Processing Systems (NeurIPS), 2022

504

149

11 Oct 2022

Imagen Video: High Definition Video Generation with Diffusion Models

Ruiqi Gao

...

David J. Fleet

578

1,972

05 Oct 2022

R-MelNet: Reduced Mel-Spectral Modeling for Neural TTS

Kyle Kastner

Aaron Courville

206

30 Jun 2022