ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.09660
  4. Cited By
WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis
v1v2 (latest)

WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

Interspeech (Interspeech), 2021
17 June 2021
Nanxin Chen
Yu Zhang
Heiga Zen
Ron J. Weiss
Mohammad Norouzi
Najim Dehak
William Chan
    DiffM
ArXiv (abs)PDFHTML

Papers citing "WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis"

50 / 63 papers shown
Visually Grounded Narratives: Reducing Cognitive Burden in Researcher-Participant Interaction
Visually Grounded Narratives: Reducing Cognitive Burden in Researcher-Participant Interaction
Runtong Wu
Jiayao Song
Fei Teng
Xianhao Ren
Yuyan Gao
Kailun Yang
165
0
0
30 Aug 2025
Fine-Tuning Text-to-Speech Diffusion Models Using Reinforcement Learning with Human Feedback
Fine-Tuning Text-to-Speech Diffusion Models Using Reinforcement Learning with Human Feedback
Jingyi Chen
Ju-Seung Byun
Micha Elsner
Pichao Wang
Andrew Perrault
157
1
0
05 Aug 2025
Flow Matching Policy Gradients
Flow Matching Policy Gradients
David McAllister
Songwei Ge
Brent Yi
Chung Min Kim
Ethan Weber
Hongsuk Choi
Haiwen Feng
Angjoo Kanazawa
369
41
0
28 Jul 2025
ItDPDM: Information-Theoretic Discrete Poisson Diffusion Model
ItDPDM: Information-Theoretic Discrete Poisson Diffusion Model
Sagnik Bhattacharya
Abhiram Gorle
Ahmed Mohsin
Ahsan Bilal
Connor Ding
Amit Kumar Singh Yadav
DiffM
771
1
0
08 May 2025
MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation
MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation
Sungwoo Cho
J. Choi
Sungnyun Kim
Se-Young Yun
396
0
0
14 Mar 2025
Diffuse or Confuse: A Diffusion Deepfake Speech Dataset
Diffuse or Confuse: A Diffusion Deepfake Speech DatasetBiometrics and Electronic Signatures (BES), 2024
Anton Firc
K. Malinka
P. Hanáček
DiffM
323
7
0
09 Oct 2024
Task-Agnostic Pre-training and Task-Guided Fine-tuning for Versatile Diffusion Planner
Task-Agnostic Pre-training and Task-Guided Fine-tuning for Versatile Diffusion Planner
Chenyou Fan
Chenjia Bai
Zhao Shan
Haoran He
Yang Zhang
Zhen Wang
458
4
0
30 Sep 2024
DiffSSD: A Diffusion-Based Dataset For Speech Forensics
DiffSSD: A Diffusion-Based Dataset For Speech ForensicsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Kratika Bhagtani
Amit Kumar Singh Yadav
Paolo Bestagini
Edward J. Delp
DiffM
243
9
0
19 Sep 2024
Central Kurdish Text-to-Speech Synthesis with Novel End-to-End
  Transformer Training
Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training
Hawraz A. Ahmad
Tarik A. Rashid
289
1
0
06 Aug 2024
WebRPG: Automatic Web Rendering Parameters Generation for Visual
  Presentation
WebRPG: Automatic Web Rendering Parameters Generation for Visual Presentation
Zirui Shao
Feiyu Gao
Hangdi Xing
Zepeng Zhu
Zhi Yu
Jiajun Bu
Qi Zheng
Cong Yao
245
6
0
22 Jul 2024
GVDIFF: Grounded Text-to-Video Generation with Diffusion Models
GVDIFF: Grounded Text-to-Video Generation with Diffusion Models
Huanzhang Dou
Ruixiang Li
Wei Su
Xi Li
DiffM
282
2
0
02 Jul 2024
Should you use a probabilistic duration model in TTS? Probably!
  Especially for spontaneous speech
Should you use a probabilistic duration model in TTS? Probably! Especially for spontaneous speechInterspeech (Interspeech), 2024
Shivam Mehta
Harm Lameris
Rajiv Punmiya
Jonas Beskow
Éva Székely
G. Henter
245
6
0
08 Jun 2024
Reinforcement Learning for Fine-tuning Text-to-speech Diffusion Models
Reinforcement Learning for Fine-tuning Text-to-speech Diffusion Models
Jingyi Chen
Ju-Seung Byun
Micha Elsner
Andrew Perrault
116
1
0
23 May 2024
G4G:A Generic Framework for High Fidelity Talking Face Generation with
  Fine-grained Intra-modal Alignment
G4G:A Generic Framework for High Fidelity Talking Face Generation with Fine-grained Intra-modal Alignment
Juan Zhang
Jiahao Chen
Cheng Wang
Zhi-Yang Yu
Tangquan Qi
Di Wu
CVBM
312
0
0
28 Feb 2024
On the Semantic Latent Space of Diffusion-Based Text-to-Speech Models
On the Semantic Latent Space of Diffusion-Based Text-to-Speech Models
Miri Varshavsky-Hassid
Roy Hirsch
Regev Cohen
Tomer Golany
Daniel Freedman
Ehud Rivlin
274
4
0
19 Feb 2024
Classification Diffusion Models: Revitalizing Density Ratio Estimation
Classification Diffusion Models: Revitalizing Density Ratio Estimation
Shahar Yadin
Noam Elata
T. Michaeli
DiffM
303
2
0
15 Feb 2024
Sampler Scheduler for Diffusion Models
Sampler Scheduler for Diffusion Models
Zitong Cheng
DiffM
263
2
0
12 Nov 2023
Lightweight Diffusion Models with Distillation-Based Block Neural
  Architecture Search
Lightweight Diffusion Models with Distillation-Based Block Neural Architecture Search
Siao Tang
Xin Wang
Hong Chen
Chaoyu Guan
Yansong Tang
Wenwu Zhu
270
7
0
08 Nov 2023
E3 TTS: Easy End-to-End Diffusion-based Text to Speech
E3 TTS: Easy End-to-End Diffusion-based Text to SpeechAutomatic Speech Recognition & Understanding (ASRU), 2023
Yuan Gao
Nobuyuki Morioka
Yu Zhang
Nanxin Chen
DiffM
354
48
0
02 Nov 2023
Improving End-to-End Speech Processing by Efficient Text Data
  Utilization with Latent Synthesis
Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent SynthesisConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Jianqiao Lu
Wenyong Huang
Nianzu Zheng
Xingshan Zeng
Y. Yeung
Xiao Chen
SyDa
313
1
0
09 Oct 2023
DiffAR: Denoising Diffusion Autoregressive Model for Raw Speech Waveform
  Generation
DiffAR: Denoising Diffusion Autoregressive Model for Raw Speech Waveform GenerationInternational Conference on Learning Representations (ICLR), 2023
Roi Benita
Michael Elad
Joseph Keshet
DiffM
621
11
0
02 Oct 2023
FastGraphTTS: An Ultrafast Syntax-Aware Speech Synthesis Framework
FastGraphTTS: An Ultrafast Syntax-Aware Speech Synthesis FrameworkIEEE International Conference on Tools with Artificial Intelligence (ICTAI), 2023
Jianzong Wang
Xulong Zhang
Aolan Sun
Ning Cheng
Jing Xiao
210
2
0
16 Sep 2023
Matcha-TTS: A fast TTS architecture with conditional flow matching
Matcha-TTS: A fast TTS architecture with conditional flow matchingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Shivam Mehta
Ruibo Tu
Jonas Beskow
Éva Székely
G. Henter
356
208
0
06 Sep 2023
DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for
  Text-to-Speech -- A Study between English and Mandarin
DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and MandarinIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Tao Li
Chenxu Hu
Jian Cong
Xinfa Zhu
Jingbei Li
Qiao Tian
Yuping Wang
Linfu Xie
DiffM
260
18
0
02 Sep 2023
VideoGen: A Reference-Guided Latent Diffusion Approach for High
  Definition Text-to-Video Generation
VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation
Xin Li
Wenqing Chu
Ye Wu
Weihang Yuan
Fanglong Liu
Tao Gui
Fu Li
Haocheng Feng
Errui Ding
Jingdong Wang
VGen
392
77
0
01 Sep 2023
DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided
  Speaker Embedding
DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker EmbeddingIEEE International Conference on Computer Vision (ICCV), 2023
J. Choi
Joanna Hong
Y. Ro
DiffM
237
33
0
15 Aug 2023
iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using
  1D-2D CNN
iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNNInterspeech (Interspeech), 2023
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Shogo Seki
203
9
0
14 Aug 2023
DPM-OT: A New Diffusion Probabilistic Model Based on Optimal Transport
DPM-OT: A New Diffusion Probabilistic Model Based on Optimal TransportIEEE International Conference on Computer Vision (ICCV), 2023
Zezeng Li
Shenghao Li
Zhanpeng Wang
Na Lei
Zhongxuan Luo
Xianfeng Gu
OTDiffM
209
24
0
21 Jul 2023
The Ethical Implications of Generative Audio Models: A Systematic
  Literature Review
The Ethical Implications of Generative Audio Models: A Systematic Literature ReviewAAAI/ACM Conference on AI, Ethics, and Society (AIES), 2023
J. Barnett
291
52
0
07 Jul 2023
Diff-TTSG: Denoising probabilistic integrated speech and gesture
  synthesis
Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesisSpeech Synthesis Workshop (SSW), 2023
Shivam Mehta
Siyang Wang
Simon Alexanderson
Jonas Beskow
Éva Székely
G. Henter
DiffM
373
18
0
15 Jun 2023
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion
  and Adversarial Training with Large Speech Language Models
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language ModelsNeural Information Processing Systems (NeurIPS), 2023
Yinghao Aaron Li
Cong Han
Vinay S. Raghavan
Gavin Mischler
N. Mesgarani
VLMDiffM
360
241
0
13 Jun 2023
HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio
  Codec and Latent Diffusion Models
HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion ModelsNeural Networks (Neural Netw.), 2023
Ji-Sang Hwang
Sang-Hoon Lee
Seong-Whan Lee
DiffM
227
22
0
12 Jun 2023
Nested Diffusion Processes for Anytime Image Generation
Nested Diffusion Processes for Anytime Image GenerationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Noam Elata
Bahjat Kawar
T. Michaeli
Michael Elad
DiffM
406
9
0
30 May 2023
Negative-prompt Inversion: Fast Image Inversion for Editing with
  Text-guided Diffusion Models
Negative-prompt Inversion: Fast Image Inversion for Editing with Text-guided Diffusion ModelsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Daiki Miyake
Akihiro Iohara
Yuriko Saito
Toshiyuki Tanaka
DiffM
304
184
0
26 May 2023
SEEDS: Exponential SDE Solvers for Fast High-Quality Sampling from
  Diffusion Models
SEEDS: Exponential SDE Solvers for Fast High-Quality Sampling from Diffusion ModelsNeural Information Processing Systems (NeurIPS), 2023
Martin Gonzalez
N. Fernández
T. Tran
Elies Gherbi
H. Hajri
N. Masmoudi
DiffM
369
38
0
23 May 2023
Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling
  in Offline Reinforcement Learning
Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement LearningInternational Conference on Machine Learning (ICML), 2023
Cheng Lu
Huayu Chen
Jianfei Chen
Hang Su
Chongxuan Li
Jun Zhu
DiffMOffRL
355
132
0
25 Apr 2023
DiffVoice: Text-to-Speech with Latent Diffusion
DiffVoice: Text-to-Speech with Latent DiffusionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Zhijun Liu
Yiwei Guo
K. Yu
DiffM
219
27
0
23 Apr 2023
A Survey on Audio Diffusion Models: Text To Speech Synthesis and
  Enhancement in Generative AI
A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI
Chenshuang Zhang
Chaoning Zhang
Sheng Zheng
Mengchun Zhang
Maryam Qamar
Sung-Ho Bae
In So Kweon
DiffMMedIm
337
110
0
23 Mar 2023
A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to
  GPT-5 All You Need?
A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need?
Chaoning Zhang
Chenshuang Zhang
Sheng Zheng
Yu Qiao
Chenghao Li
...
Lik-Hang Lee
Yang Yang
Heng Tao Shen
In So Kweon
Choong Seon Hong
387
208
0
21 Mar 2023
GECCO: Geometrically-Conditioned Point Diffusion Models
GECCO: Geometrically-Conditioned Point Diffusion ModelsIEEE International Conference on Computer Vision (ICCV), 2023
M. Tyszkiewicz
Pascal Fua
Eduard Trulls
DiffM
357
28
0
10 Mar 2023
Understanding Diffusion Objectives as the ELBO with Simple Data
  Augmentation
Understanding Diffusion Objectives as the ELBO with Simple Data AugmentationNeural Information Processing Systems (NeurIPS), 2023
Diederik P. Kingma
Ruiqi Gao
DiffM
927
280
0
01 Mar 2023
DIFUSCO: Graph-based Diffusion Solvers for Combinatorial Optimization
DIFUSCO: Graph-based Diffusion Solvers for Combinatorial OptimizationNeural Information Processing Systems (NeurIPS), 2023
Zhiqing Sun
Yiming Yang
DiffM
411
254
0
16 Feb 2023
Dual Diffusion Architecture for Fisheye Image Rectification:
  Synthetic-to-Real Generalization
Dual Diffusion Architecture for Fisheye Image Rectification: Synthetic-to-Real Generalization
Shangrong Yang
Chunyu Lin
K. Liao
Yao Zhao
DiffM
257
10
0
26 Jan 2023
Separate And Diffuse: Using a Pretrained Diffusion Model for Improving
  Source Separation
Separate And Diffuse: Using a Pretrained Diffusion Model for Improving Source Separation
Shahar Lutati
Eliya Nachmani
Lior Wolf
DiffM
267
23
0
25 Jan 2023
Text-to-speech synthesis based on latent variable conversion using
  diffusion probabilistic model and variational autoencoder
Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoderIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Yusuke Yasuda
Tomoki Toda
DiffM
266
9
0
16 Dec 2022
DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models
DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic ModelsMachine Intelligence Research (MIR), 2022
Cheng Lu
Yuhao Zhou
Fan Bao
Jianfei Chen
Chongxuan Li
Jun Zhu
DiffM
1.1K
940
0
02 Nov 2022
LION: Latent Point Diffusion Models for 3D Shape Generation
LION: Latent Point Diffusion Models for 3D Shape GenerationNeural Information Processing Systems (NeurIPS), 2022
Fangyin Wei
Arash Vahdat
Francis Williams
Zan Gojcic
Or Litany
Sanja Fidler
Karsten Kreis
DiffM
471
662
0
12 Oct 2022
GENIE: Higher-Order Denoising Diffusion Solvers
GENIE: Higher-Order Denoising Diffusion SolversNeural Information Processing Systems (NeurIPS), 2022
Tim Dockhorn
Arash Vahdat
Karsten Kreis
DiffM
504
149
0
11 Oct 2022
Imagen Video: High Definition Video Generation with Diffusion Models
Imagen Video: High Definition Video Generation with Diffusion Models
Jonathan Ho
William Chan
Chitwan Saharia
Jay Whang
Ruiqi Gao
...
Diederik P. Kingma
Ben Poole
Mohammad Norouzi
David J. Fleet
Tim Salimans
VGen
578
1,972
0
05 Oct 2022
R-MelNet: Reduced Mel-Spectral Modeling for Neural TTS
R-MelNet: Reduced Mel-Spectral Modeling for Neural TTS
Kyle Kastner
Aaron Courville
206
0
0
30 Jun 2022
12
Next
Page 1 of 2