Extending Segment Anything Model into Auditory and Temporal Dimensions for Audio-Visual Segmentation

10 June 2024

Papers citing "Extending Segment Anything Model into Auditory and Temporal Dimensions for Audio-Visual Segmentation"

4 / 4 papers shown

Title
CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation Kexin Li Zongxin Yang Lei Chen Yezhou Yang Jun Xiao VOS 28 49 0 18 Sep 2023
AV-SAM: Segment Anything Model Meets Audio-Visual Localization and Segmentation Shentong Mo Yapeng Tian VLM 79 47 0 03 May 2023
Masked Autoencoders Are Scalable Vision Learners Kaiming He Xinlei Chen Saining Xie Yanghao Li Piotr Dollár Ross B. Girshick ViT TPM 258 7,337 0 11 Nov 2021
Is Space-Time Attention All You Need for Video Understanding? Gedas Bertasius Heng Wang Lorenzo Torresani ViT 275 1,939 0 09 Feb 2021