Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.08151
Cited By
Rethink Cross-Modal Fusion in Weakly-Supervised Audio-Visual Video Parsing
14 November 2023
Yating Xu
Conghui Hu
Gim Hee Lee
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Rethink Cross-Modal Fusion in Weakly-Supervised Audio-Visual Video Parsing"
2 / 2 papers shown
Title
Omnivore: A Single Model for Many Visual Modalities
Rohit Girdhar
Mannat Singh
Nikhil Ravi
L. V. D. van der Maaten
Armand Joulin
Ishan Misra
209
225
0
20 Jan 2022
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
410
594
0
21 Jul 2020
1