Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2312.01017
Cited By
Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling
2 December 2023
Shentong Mo
Pedro Morgado
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling"
15 / 15 papers shown
Title
OpenAVS: Training-Free Open-Vocabulary Audio Visual Segmentation with Foundational Models
Shengkai Chen
Yifang Yin
Jinming Cao
Shili Xiang
Zhenguang Liu
Roger Zimmermann
VOS
VLM
37
0
0
30 Apr 2025
DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap
Shentong Mo
Zehua Chen
Fan Bao
Jun-Jie Zhu
DiffM
50
0
0
15 Mar 2025
From Prototypes to General Distributions: An Efficient Curriculum for Masked Image Modeling
Jinhong Lin
Cheng-En Wu
Huanran Li
Jifan Zhang
Yu Hen Hu
Pedro Morgado
23
0
0
16 Nov 2024
Aligning Audio-Visual Joint Representations with an Agentic Workflow
Shentong Mo
Yibing Song
21
0
0
30 Oct 2024
MMP: Towards Robust Multi-Modal Learning with Masked Modality Projection
Niki Nezakati
Md Kaykobad Reza
Ameya Patil
Mashhour Solh
M. Salman Asif
27
1
0
03 Oct 2024
Multi-scale Multi-instance Visual Sound Localization and Segmentation
Shentong Mo
Haofan Wang
25
2
0
31 Aug 2024
Audio-visual Generalized Zero-shot Learning the Easy Way
Shentong Mo
Pedro Morgado
22
1
0
18 Jul 2024
Semantic Grouping Network for Audio Source Separation
Shentong Mo
Yapeng Tian
28
4
0
04 Jul 2024
Unified Video-Language Pre-training with Synchronized Audio
Shentong Mo
Haofan Wang
Huaxia Li
Xu Tang
27
1
0
12 May 2024
Text-to-Audio Generation Synchronized with Videos
Shentong Mo
Jing Shi
Yapeng Tian
DiffM
VGen
28
17
0
08 Mar 2024
Weakly-Supervised Audio-Visual Segmentation
Shentong Mo
Bhiksha Raj
VOS
28
12
0
25 Nov 2023
AV-SAM: Segment Anything Model Meets Audio-Visual Localization and Segmentation
Shentong Mo
Yapeng Tian
VLM
79
47
0
03 May 2023
Audio-Visual Segmentation with Semantics
Jinxing Zhou
Xuyang Shen
Jianyuan Wang
Jiayi Zhang
Weixuan Sun
...
Stan Birchfield
Dan Guo
Lingpeng Kong
Meng Wang
Yiran Zhong
VOS
35
37
0
30 Jan 2023
A Closer Look at Weakly-Supervised Audio-Visual Source Localization
Shentong Mo
Pedro Morgado
69
64
0
30 Aug 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
258
7,337
0
11 Nov 2021
1