ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.11940
  4. Cited By
Audio Generation with Multiple Conditional Diffusion Model

Audio Generation with Multiple Conditional Diffusion Model

23 August 2023
Zhifang Guo
Jianguo Mao
Ruijie Tao
Long Yan
Kazushige Ouchi
Hong Liu
Xiangdong Wang
    DiffM
ArXivPDFHTML

Papers citing "Audio Generation with Multiple Conditional Diffusion Model"

10 / 10 papers shown
Title
Audio-Language Datasets of Scenes and Events: A Survey
Audio-Language Datasets of Scenes and Events: A Survey
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
74
2
0
10 Jan 2025
AudioComposer: Towards Fine-grained Audio Generation with Natural Language Descriptions
AudioComposer: Towards Fine-grained Audio Generation with Natural Language Descriptions
Y. Wang
Hangting Chen
Dongchao Yang
Zhiyong Wu
Xixin Wu
DiffM
40
2
0
19 Sep 2024
Read, Watch and Scream! Sound Generation from Text and Video
Read, Watch and Scream! Sound Generation from Text and Video
Yujin Jeong
Yunji Kim
Sanghyuk Chun
Jiyoung Lee
VGen
DiffM
25
11
0
08 Jul 2024
Text-to-Audio Generation using Instruction-Tuned LLM and Latent
  Diffusion Model
Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model
Deepanway Ghosal
Navonil Majumder
Ambuj Mehrish
Soujanya Poria
135
137
0
24 Apr 2023
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion
  Models
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
Rongjie Huang
Jia-Bin Huang
Dongchao Yang
Yi Ren
Luping Liu
Mingze Li
Zhenhui Ye
Jinglin Liu
Xiaoyue Yin
Zhou Zhao
DiffM
137
304
0
30 Jan 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Audio-to-Image Cross-Modal Generation
Audio-to-Image Cross-Modal Generation
Maciej Żelaszczyk
Jacek Mañdziuk
DiffM
46
12
0
27 Sep 2021
Pix2seq: A Language Modeling Framework for Object Detection
Pix2seq: A Language Modeling Framework for Object Detection
Ting-Li Chen
Saurabh Saxena
Lala Li
David J. Fleet
Geoffrey E. Hinton
MLLM
ViT
VLM
233
341
0
22 Sep 2021
Decoupling Magnitude and Phase Estimation with Deep ResUNet for Music
  Source Separation
Decoupling Magnitude and Phase Estimation with Deep ResUNet for Music Source Separation
Qiuqiang Kong
Yin Cao
Haohe Liu
Keunwoo Choi
Yuxuan Wang
116
95
0
12 Sep 2021
Image-to-Image Translation with Conditional Adversarial Networks
Image-to-Image Translation with Conditional Adversarial Networks
Phillip Isola
Jun-Yan Zhu
Tinghui Zhou
Alexei A. Efros
SSeg
209
19,191
0
21 Nov 2016
1