Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2012.03478
Cited By
Multi-Instrumentalist Net: Unsupervised Generation of Music from Body Movements
7 December 2020
Kun Su
Xiulong Liu
Eli Shlizerman
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Multi-Instrumentalist Net: Unsupervised Generation of Music from Body Movements"
21 / 21 papers shown
VGGSounder: Audio-Visual Evaluations for Foundation Models
Daniil Zverev
Thaddäus Wiedemer
Christian Schroeder de Witt
Matthias Bethge
Wieland Brendel
A. Sophia Koepke
AuLLM
259
5
0
11 Aug 2025
Controllable Video-to-Music Generation with Multiple Time-Varying Conditions
Junxian Wu
W. You
H. Zuo
Dengming Zhang
Pei Chen
Lingyun Sun
VGen
151
3
0
28 Jul 2025
A Survey on Cross-Modal Interaction Between Music and Multimodal Data
Sifei Li
Mining Tan
Feier Shen
Minyan Luo
Zijiao Yin
Fan Tang
Weiming Dong
Changsheng Xu
405
2
0
17 Apr 2025
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
Shuyu Li
Shulei Ji
Zihao Wang
Songruoyao Wu
Jiaxing Yu
Jianchao Tan
MGen
VGen
609
3
0
01 Apr 2025
Vision-to-Music Generation: A Survey
Zhaokai Wang
Chenxi Bao
Le Zhuo
Jingrui Han
Yang Yue
Yihong Tang
Victor Shea-Jay Huang
Yue Liao
EGVM
VGen
395
4
0
27 Mar 2025
MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization
Ruiqi Li
Siqi Zheng
Xize Cheng
Ziang Zhang
Shengpeng Ji
Zhou Zhao
VGen
333
17
0
16 Oct 2024
From Vision to Audio and Beyond: A Unified Model for Audio-Visual Representation and Generation
International Conference on Machine Learning (ICML), 2024
Kun Su
Xiulong Liu
Eli Shlizerman
VGen
435
19
0
27 Sep 2024
Audio-Visual Generalized Zero-Shot Learning using Pre-Trained Large Multi-Modal Models
David Kurzendörfer
Otniel-Bogdan Mercea
A. Sophia Koepke
Zeynep Akata
VLM
CLIP
216
3
0
09 Apr 2024
The NES Video-Music Database: A Dataset of Symbolic Video Game Music Paired with Gameplay Videos
Igor Cardoso
Rubens O. Moraes
Lucas N. Ferreira
309
9
0
05 Apr 2024
Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model
Expert systems with applications (ESWA), 2023
Jaeyong Kang
Soujanya Poria
Dorien Herremans
MGen
VGen
422
69
0
02 Nov 2023
Tackling Data Bias in MUSIC-AVQA: Crafting a Balanced Dataset for Unbiased Question-Answering
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Xiulong Liu
Zhikang Dong
Peng Zhang
239
38
0
10 Oct 2023
Text-to-feature diffusion for audio-visual few-shot learning
Otniel-Bogdan Mercea
Thomas Hummel
A. Sophia Koepke
Zeynep Akata
VLM
248
4
0
07 Sep 2023
V2Meow: Meowing to the Visual Beat via Video-to-Music Generation
AAAI Conference on Artificial Intelligence (AAAI), 2023
Kun Su
Judith Yue Li
Qingqing Huang
Dima Kuzmin
Joonseok Lee
...
Fei Sha
A. Jansen
Yu Wang
Mauro Verzetti
Timo I. Denk
VGen
215
26
0
11 May 2023
Long-Term Rhythmic Video Soundtracker
International Conference on Machine Learning (ICML), 2023
Jiashuo Yu
Yaohui Wang
Xinyuan Chen
Xiao Sun
Yu Qiao
DiffM
368
20
0
02 May 2023
Conditional Generation of Audio from Video via Foley Analogies
Computer Vision and Pattern Recognition (CVPR), 2023
Yuexi Du
Ziyang Chen
Justin Salamon
Bryan C. Russell
Andrew Owens
VGen
226
63
0
17 Apr 2023
Co-Speech Gesture Synthesis using Discrete Gesture Token Learning
IEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2023
Shuhong Lu
Youngwoo Yoon
Andrew W. Feng
SLR
173
15
0
04 Mar 2023
Video Background Music Generation: Dataset, Method and Evaluation
IEEE International Conference on Computer Vision (ICCV), 2022
Le Zhuo
Zhaokai Wang
Baisen Wang
Yue Liao
Chenxi Bao
Stanley Peng
Miao Lu
Xiaobo Li
Fei Fang
Si Liu
VGen
384
50
0
21 Nov 2022
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei
Di Hu
Yapeng Tian
Xuelong Li
311
73
0
20 Aug 2022
Temporal and cross-modal attention for audio-visual zero-shot learning
European Conference on Computer Vision (ECCV), 2022
Otniel-Bogdan Mercea
Thomas Hummel
A. Sophia Koepke
Zeynep Akata
211
32
0
20 Jul 2022
Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and Language
Computer Vision and Pattern Recognition (CVPR), 2022
Otniel-Bogdan Mercea
Lukas Riesch
A. Sophia Koepke
Zeynep Akata
196
56
0
07 Mar 2022
Video Background Music Generation with Controllable Music Transformer
Shangzhe Di
Jiang
Sihan Liu
Zhaokai Wang
Leyan Zhu
Zexin He
Hongming Liu
Shuicheng Yan
256
124
0
16 Nov 2021
1
Page 1 of 1