ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.08151
  4. Cited By
Rethink Cross-Modal Fusion in Weakly-Supervised Audio-Visual Video
  Parsing

Rethink Cross-Modal Fusion in Weakly-Supervised Audio-Visual Video Parsing

14 November 2023
Yating Xu
Conghui Hu
Gim Hee Lee
ArXivPDFHTML

Papers citing "Rethink Cross-Modal Fusion in Weakly-Supervised Audio-Visual Video Parsing"

2 / 2 papers shown
Title
Omnivore: A Single Model for Many Visual Modalities
Omnivore: A Single Model for Many Visual Modalities
Rohit Girdhar
Mannat Singh
Nikhil Ravi
L. V. D. van der Maaten
Armand Joulin
Ishan Misra
209
225
0
20 Jan 2022
Multi-modal Transformer for Video Retrieval
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
410
594
0
21 Jul 2020
1