Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.17723
Cited By
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
27 February 2024
Yazhou Xing
Yin-Yin He
Zeyue Tian
Xintao Wang
Qifeng Chen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners"
17 / 17 papers shown
Title
OmniAudio: Generating Spatial Audio from 360-Degree Video
Huadai Liu
Tianyi Luo
Qikai Jiang
Kaicheng Luo
Peiwen Sun
...
X. Li
Shiliang Zhang
Zhijie Yan
Zhou Zhao
Wei Xue
VGen
51
0
0
21 Apr 2025
AudioX: Diffusion Transformer for Anything-to-Audio Generation
Zeyue Tian
Yizhu Jin
Zhaoyang Liu
Ruibin Yuan
Xu Tan
Qifeng Chen
Wei Xue
Y. Guo
65
3
0
13 Mar 2025
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Ho Kei Cheng
Masato Ishii
Akio Hayakawa
Takashi Shibuya
A. Schwing
Yuki Mitsufuji
VGen
126
12
0
19 Dec 2024
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach
Rory Young
Nicolas Pugeault
AAML
57
3
0
14 Oct 2024
Audio-Agent: Leveraging LLMs For Audio Generation, Editing and Composition
Zixuan Wang
Chi-Keung Tang
Chi-Keung Tang
DiffM
VGen
LLMAG
38
4
0
04 Oct 2024
MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation
T. Pham
Tri Ton
Chang D. Yoo
36
3
0
03 Oct 2024
A Simple but Strong Baseline for Sounding Video Generation: Effective Adaptation of Audio and Video Diffusion Models for Joint Generation
Masato Ishii
Akio Hayakawa
Takashi Shibuya
Yuki Mitsufuji
VGen
DiffM
63
4
0
26 Sep 2024
STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment
Yong Ren
Chenxing Li
Manjie Xu
Wei Liang
Yu Gu
Rilin Chen
Dong Yu
VGen
DiffM
43
6
0
13 Sep 2024
TALE: Training-free Cross-domain Image Composition via Adaptive Latent Manipulation and Energy-guided Optimization
K. T. Pham
Jingye Chen
Qifeng Chen
DiffM
34
0
0
07 Aug 2024
Video-to-Audio Generation with Hidden Alignment
Manjie Xu
Chenxing Li
Yong Ren
Rilin Chen
Yu Gu
Yu Gu
Dong Yu
Dong Yu
DiffM
VGen
43
11
0
10 Jul 2024
Read, Watch and Scream! Sound Generation from Text and Video
Yujin Jeong
Yunji Kim
Sanghyuk Chun
Jiyoung Lee
VGen
DiffM
25
11
0
08 Jul 2024
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Zeyue Tian
Zhaoyang Liu
Ruibin Yuan
Jiahao Pan
Xiaoqiang Huang
Xu Tan
Xu Tan
Qifeng Chen
Y. Guo
VGen
97
16
0
06 Jun 2024
Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model
Deepanway Ghosal
Navonil Majumder
Ambuj Mehrish
Soujanya Poria
138
137
0
24 Apr 2023
GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis
Ming Tao
Bingkun Bao
Hao Tang
Changsheng Xu
DiffM
VLM
58
99
0
30 Jan 2023
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
Rongjie Huang
Jia-Bin Huang
Dongchao Yang
Yi Ren
Luping Liu
Mingze Li
Zhenhui Ye
Jinglin Liu
Xiaoyue Yin
Zhou Zhao
DiffM
140
304
0
30 Jan 2023
Audio-to-Image Cross-Modal Generation
Maciej Żelaszczyk
Jacek Mañdziuk
DiffM
46
15
0
27 Sep 2021
Sound2Sight: Generating Visual Dynamics from Sound and Context
A. Cherian
Moitreya Chatterjee
N. Ahuja
VGen
64
35
0
23 Jul 2020
1