ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2207.09983
  4. Cited By
Diffsound: Discrete Diffusion Model for Text-to-sound Generation

Diffsound: Discrete Diffusion Model for Text-to-sound Generation

20 July 2022
Dongchao Yang
Jianwei Yu
Helin Wang
Wen Wang
Chao Weng
Yuexian Zou
Dong Yu
    DiffM
ArXivPDFHTML

Papers citing "Diffsound: Discrete Diffusion Model for Text-to-sound Generation"

33 / 33 papers shown
Title
Denoising Diffusion Probabilistic Models for Coastal Inundation Forecasting
Denoising Diffusion Probabilistic Models for Coastal Inundation Forecasting
Kazi Ashik Islam
Zakaria Mehrab
Mahantesh Halappanavar
H. Mortveit
Sridhar Katragadda
Jon Derek Loftis
Madhav V. Marathe
DiffM
AI4CE
37
0
0
08 May 2025
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
Shuyu Li
Shulei Ji
Zihao W. Wang
Songruoyao Wu
Jiaxing Yu
K. Zhang
MGen
VGen
63
1
0
01 Apr 2025
DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap
DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap
Shentong Mo
Zehua Chen
Fan Bao
Jun-Jie Zhu
DiffM
50
0
0
15 Mar 2025
Bayesian Computation in Deep Learning
Bayesian Computation in Deep Learning
Wenlong Chen
Bolian Li
Ruqi Zhang
Yingzhen Li
BDL
70
0
0
25 Feb 2025
Simplified and Generalized Masked Diffusion for Discrete Data
Simplified and Generalized Masked Diffusion for Discrete Data
Jiaxin Shi
Kehang Han
Z. Wang
Arnaud Doucet
Michalis K. Titsias
DiffM
74
62
0
17 Jan 2025
Audio-Language Datasets of Scenes and Events: A Survey
Audio-Language Datasets of Scenes and Events: A Survey
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
79
2
0
10 Jan 2025
Text2Data: Low-Resource Data Generation with Textual Control
Text2Data: Low-Resource Data Generation with Textual Control
Shiyu Wang
Yihao Feng
Tian Lan
Ning Yu
Yu Bai
R. Xu
H. Wang
Caiming Xiong
S.
DiffM
80
0
0
03 Jan 2025
Spider: Any-to-Many Multimodal LLM
Spider: Any-to-Many Multimodal LLM
Jinxiang Lai
Jie Zhang
Jun Liu
Jian Li
Xiaocheng Lu
Song Guo
MLLM
54
2
0
14 Nov 2024
Multi-Scale Diffusion: Enhancing Spatial Layout in High-Resolution Panoramic Image Generation
Multi-Scale Diffusion: Enhancing Spatial Layout in High-Resolution Panoramic Image Generation
Xiaoyu Zhang
Teng Zhou
Xinlong Zhang
Jia Wei
Yongchuan Tang
42
1
0
24 Oct 2024
How Discrete and Continuous Diffusion Meet: Comprehensive Analysis of Discrete Diffusion Models via a Stochastic Integral Framework
How Discrete and Continuous Diffusion Meet: Comprehensive Analysis of Discrete Diffusion Models via a Stochastic Integral Framework
Yinuo Ren
Haoxuan Chen
Grant M. Rotskoff
Lexing Ying
33
3
0
04 Oct 2024
DNI: Dilutional Noise Initialization for Diffusion Video Editing
DNI: Dilutional Noise Initialization for Diffusion Video Editing
Sunjae Yoon
Gwanhyeong Koo
Ji Woo Hong
Chang D. Yoo
DiffM
31
2
0
19 Sep 2024
AudioComposer: Towards Fine-grained Audio Generation with Natural Language Descriptions
AudioComposer: Towards Fine-grained Audio Generation with Natural Language Descriptions
Y. Wang
Hangting Chen
Dongchao Yang
Zhiyong Wu
Xixin Wu
DiffM
40
2
0
19 Sep 2024
MambaFoley: Foley Sound Generation using Selective State-Space Models
MambaFoley: Foley Sound Generation using Selective State-Space Models
Marco Furio Colombo
Francesca Ronchini
Luca Comanducci
Fabio Antonacci
Mamba
20
1
0
13 Sep 2024
Read, Watch and Scream! Sound Generation from Text and Video
Read, Watch and Scream! Sound Generation from Text and Video
Yujin Jeong
Yunji Kim
Sanghyuk Chun
Jiyoung Lee
VGen
DiffM
25
11
0
08 Jul 2024
PAGURI: a user experience study of creative interaction with
  text-to-music models
PAGURI: a user experience study of creative interaction with text-to-music models
Francesca Ronchini
Luca Comanducci
Gabriele Perego
Fabio Antonacci
26
2
0
05 Jul 2024
EmoAttack: Emotion-to-Image Diffusion Models for Emotional Backdoor Generation
EmoAttack: Emotion-to-Image Diffusion Models for Emotional Backdoor Generation
Tianyu Wei
Shanmin Pang
Qi Guo
Yizhuo Ma
Qing Guo
Ming-Ming Cheng
Qing Guo
58
2
0
22 Jun 2024
LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation
LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation
Wenhao Guan
K. Wang
Wangjin Zhou
Yang Wang
Feng Deng
Hui Wang
Lin Li
Q. Hong
Yong Qin
DiffM
28
3
0
12 Jun 2024
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Zeyue Tian
Zhaoyang Liu
Ruibin Yuan
Jiahao Pan
Xiaoqiang Huang
Xu Tan
Xu Tan
Qifeng Chen
Y. Guo
VGen
97
16
0
06 Jun 2024
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive
  Modeling of Audio Discrete Codes
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Trung D. Q. Dang
David Aponte
Dung Tran
K. Koishida
34
3
0
05 Jun 2024
AudioSetMix: Enhancing Audio-Language Datasets with LLM-Assisted
  Augmentations
AudioSetMix: Enhancing Audio-Language Datasets with LLM-Assisted Augmentations
David Xu
21
2
0
17 May 2024
Denoising Task Difficulty-based Curriculum for Training Diffusion Models
Denoising Task Difficulty-based Curriculum for Training Diffusion Models
Jin-Young Kim
Hyojun Go
Soonwoo Kwon
Hyun-Gyoon Kim
DiffM
46
6
0
15 Mar 2024
SonicVisionLM: Playing Sound with Vision Language Models
SonicVisionLM: Playing Sound with Vision Language Models
Zhifeng Xie
Shengye Yu
Qile He
Mengtian Li
VLM
VGen
22
2
0
09 Jan 2024
AI-Generated Content (AIGC) for Various Data Modalities: A Survey
AI-Generated Content (AIGC) for Various Data Modalities: A Survey
Lin Geng Foo
Hossein Rahmani
J. Liu
62
31
0
27 Aug 2023
JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models
JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models
Peike Li
Bo-Yu Chen
Yao Yao
Yikai Wang
Allen Wang
Alex Jinpeng Wang
MGen
VLM
DiffM
57
37
0
09 Aug 2023
Squeezing Large-Scale Diffusion Models for Mobile
Squeezing Large-Scale Diffusion Models for Mobile
Jiwoong Choi
Minkyu Kim
Daehyun Ahn
Taesu Kim
Yulhwa Kim
Do-Hyun Jo
H. Jeon
Jae-Joon Kim
Hyungjun Kim
13
9
0
03 Jul 2023
DiffUCD:Unsupervised Hyperspectral Image Change Detection with Semantic
  Correlation Diffusion Model
DiffUCD:Unsupervised Hyperspectral Image Change Detection with Semantic Correlation Diffusion Model
Xiangrong Zhang
Shunli Tian
Guanchun Wang
Huiyu Zhou
Licheng Jiao
DiffM
33
5
0
21 May 2023
A-CAP: Anticipation Captioning with Commonsense Knowledge
A-CAP: Anticipation Captioning with Commonsense Knowledge
D. Vo
Quoc-An Luong
Akihiro Sugimoto
Hideki Nakayama
19
2
0
13 Apr 2023
DiffTAD: Temporal Action Detection with Proposal Denoising Diffusion
DiffTAD: Temporal Action Detection with Proposal Denoising Diffusion
Sauradip Nag
Xiatian Zhu
Jiankang Deng
Yi-Zhe Song
Tao Xiang
DiffM
VGen
25
21
0
27 Mar 2023
PDPP: Projected Diffusion for Procedure Planning in Instructional Videos
PDPP: Projected Diffusion for Procedure Planning in Instructional Videos
Hanlin Wang
Yilu Wu
Sheng Guo
Limin Wang
VGen
DiffM
63
30
0
26 Mar 2023
NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for
  Noise-robust Expressive TTS
NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS
Dongchao Yang
Songxiang Liu
Jianwei Yu
Helin Wang
Chao Weng
Yuexian Zou
DiffM
VLM
29
18
0
04 Nov 2022
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,735
0
24 Feb 2021
Argmax Flows and Multinomial Diffusion: Learning Categorical
  Distributions
Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions
Emiel Hoogeboom
Didrik Nielsen
P. Jaini
Patrick Forré
Max Welling
DiffM
202
392
0
10 Feb 2021
Image-to-Image Translation with Conditional Adversarial Networks
Image-to-Image Translation with Conditional Adversarial Networks
Phillip Isola
Jun-Yan Zhu
Tinghui Zhou
Alexei A. Efros
SSeg
212
19,191
0
21 Nov 2016
1