Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2301.12661
Cited By
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
30 January 2023
Rongjie Huang
Jia-Bin Huang
Dongchao Yang
Yi Ren
Luping Liu
Mingze Li
Zhenhui Ye
Jinglin Liu
Xiaoyue Yin
Zhou Zhao
DiffM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models"
35 / 35 papers shown
Title
Rethinking Score Distilling Sampling for 3D Editing and Generation
Xingyu Miao
Haoran Duan
Yang Long
J. Han
30
0
0
03 May 2025
Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models
Prin Phunyaphibarn
Phillip Y. Lee
Jaihoon Kim
Minhyuk Sung
DiffM
68
0
0
26 Mar 2025
DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness
Yiming Zhong
Qi Jiang
Jingyi Yu
Yuexin Ma
46
2
0
11 Mar 2025
On Memorization in Diffusion Models
Xiangming Gu
Chao Du
Tianyu Pang
Chongxuan Li
Min-Bin Lin
Ye Wang
DiffM
TDI
136
43
0
21 Feb 2025
Dual-Flow: Transferable Multi-Target, Instance-Agnostic Attacks via In-the-wild Cascading Flow Optimization
Yixiao Chen
Shikun Sun
Jianshu Li
Ruoyu Li
Zhe Li
Junliang Xing
AAML
71
0
0
04 Feb 2025
VideoShield: Regulating Diffusion-based Video Generation Models via Watermarking
Runyi Hu
J. Zhang
Y. Li
Jiwei Li
Qing-Wu Guo
Han Qiu
Tianwei Zhang
WIGM
VGen
59
4
0
24 Jan 2025
EXION: Exploiting Inter- and Intra-Iteration Output Sparsity for Diffusion Models
Jaehoon Heo
Adiwena Putra
Jieon Yoon
Sungwoong Yune
Hangyeol Lee
Ji-Hoon Kim
Joo-Young Kim
DiffM
36
1
0
10 Jan 2025
Audio-Language Datasets of Scenes and Events: A Survey
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
47
2
0
10 Jan 2025
Text2Data: Low-Resource Data Generation with Textual Control
Shiyu Wang
Yihao Feng
Tian Lan
Ning Yu
Yu Bai
R. Xu
H. Wang
Caiming Xiong
S.
DiffM
63
0
0
03 Jan 2025
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization
Chia-Yu Hung
Navonil Majumder
Zhifeng Kong
Ambuj Mehrish
Rafael Valle
Bryan Catanzaro
Soujanya Poria
Bryan Catanzaro
Soujanya Poria
34
4
0
30 Dec 2024
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Ho Kei Cheng
Masato Ishii
Akio Hayakawa
Takashi Shibuya
A. Schwing
Yuki Mitsufuji
VGen
105
12
0
19 Dec 2024
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows
Shufan Li
Konstantinos Kallidromitis
Akash Gokul
Zichun Liao
Yusuke Kato
Kazuki Kozuka
Aditya Grover
VGen
72
5
0
02 Dec 2024
Spider: Any-to-Many Multimodal LLM
Jinxiang Lai
Jie Zhang
Jun Liu
Jian Li
Xiaocheng Lu
Song Guo
MLLM
35
2
0
14 Nov 2024
Human-inspired Perspectives: A Survey on AI Long-term Memory
Zihong He
Weizhe Lin
Hao Zheng
Fan Zhang
Matt Jones
Laurence Aitchison
X. Xu
Miao Liu
Per Ola Kristensson
Junxiao Shen
42
2
0
01 Nov 2024
Multi-Scale Diffusion: Enhancing Spatial Layout in High-Resolution Panoramic Image Generation
Xiaoyu Zhang
Teng Zhou
Xinlong Zhang
Jia Wei
Yongchuan Tang
24
1
0
24 Oct 2024
Truncated Consistency Models
Sangyun Lee
Yilun Xu
Tomas Geffner
Giulia Fanti
Karsten Kreis
Arash Vahdat
Weili Nie
11
3
0
18 Oct 2024
An Online Learning Approach to Prompt-based Selection of Generative Models
Xiaoyan Hu
Ho-fung Leung
Farzan Farnia
26
2
0
17 Oct 2024
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach
Rory Young
Nicolas Pugeault
AAML
27
0
0
14 Oct 2024
Audio-Agent: Leveraging LLMs For Audio Generation, Editing and Composition
Zixuan Wang
Chi-Keung Tang
Chi-Keung Tang
DiffM
VGen
LLMAG
18
4
0
04 Oct 2024
FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows"
Yifei Ming
Senthil Purushwalkam
Shrey Pandit
Zixuan Ke
Xuan-Phi Nguyen
Caiming Xiong
Shafiq R. Joty
HILM
102
16
0
30 Sep 2024
A Simple but Strong Baseline for Sounding Video Generation: Effective Adaptation of Audio and Video Diffusion Models for Joint Generation
Masato Ishii
Akio Hayakawa
Takashi Shibuya
Yuki Mitsufuji
VGen
DiffM
45
4
0
26 Sep 2024
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
Weifeng Lin
Xinyu Wei
Renrui Zhang
Le Zhuo
Shitian Zhao
...
Junlin Xie
Junlin Xie
Yu Qiao
Peng Gao
Hongsheng Li
MLLM
DiffM
32
10
0
23 Sep 2024
AudioComposer: Towards Fine-grained Audio Generation with Natural Language Descriptions
Y. Wang
Hangting Chen
Dongchao Yang
Zhiyong Wu
Xixin Wu
DiffM
21
2
0
19 Sep 2024
MambaFoley: Foley Sound Generation using Selective State-Space Models
Marco Furio Colombo
Francesca Ronchini
Luca Comanducci
Fabio Antonacci
Mamba
16
1
0
13 Sep 2024
D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matching
Jingyu Liu
Minquan Wang
Ye Ma
Bo Wang
Aozhu Chen
Quan Chen
Peng Jiang
Xirong Li
26
1
0
23 Aug 2024
Atlas Gaussians Diffusion for 3D Generation
Haitao Yang
Yuan Dong
Hanwen Jiang
Dejia Xu
Georgios Pavlakos
Qixing Huang
3DGS
29
3
0
23 Aug 2024
JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models
Peike Li
Bo-Yu Chen
Yao Yao
Yikai Wang
Allen Wang
Alex Jinpeng Wang
MGen
VLM
DiffM
41
36
0
09 Aug 2023
HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec
Dongchao Yang
Songxiang Liu
Rongjie Huang
Jinchuan Tian
Chao Weng
Yuexian Zou
138
45
0
04 May 2023
Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model
Deepanway Ghosal
Navonil Majumder
Ambuj Mehrish
Soujanya Poria
135
137
0
24 Apr 2023
Audio Retrieval with WavText5K and CLAP Training
Soham Deshmukh
Benjamin Elizalde
Huaming Wang
3DV
CLIP
105
41
0
28 Sep 2022
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers
Wenyi Hong
Ming Ding
Wendi Zheng
Xinghan Liu
Jie Tang
DiffM
235
556
0
29 May 2022
GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech
Rongjie Huang
Yi Ren
Jinglin Liu
Chenye Cui
Zhou Zhao
OODD
VLM
104
34
0
15 May 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
255
5,353
0
11 Nov 2021
Why Do We Click: Visual Impression-aware News Recommendation
Jiahao Xun
Shengyu Zhang
Zhou Zhao
Jieming Zhu
Qi Zhang
Jingjie Li
Xiuqiang He
Xiaofei He
Tat-Seng Chua
Fei Wu
88
29
0
26 Sep 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
3,790
0
24 Feb 2021
1