Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2210.05060
Cited By
AVE-CLIP: AudioCLIP-based Multi-window Temporal Transformer for Audio Visual Event Localization
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
11 October 2022
Tanvir Mahmud
Diana Marculescu
CLIP
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"AVE-CLIP: AudioCLIP-based Multi-window Temporal Transformer for Audio Visual Event Localization"
25 / 25 papers shown
Title
Real-Time Inference for Distributed Multimodal Systems under Communication Delay Uncertainty
Victor Croisfelt
João Henrique Inacio de Souza
Shashi Raj Pandey
B. Soret
P. Popovski
153
0
0
20 Nov 2025
Energy-Efficient Domain-Specific Artificial Intelligence Models and Agents: Pathways and Paradigms
Abhijit Chatterjee
N. Jha
Jonathan D. Cohen
Thomas Griffiths
Hongjing Lu
Diana Marculescu
Ashiqur Rasul
Keshab K. Parhi
LLMAG
AI4CE
328
0
0
24 Oct 2025
CLASP: Cross-modal Salient Anchor-based Semantic Propagation for Weakly-supervised Dense Audio-Visual Event Localization
Jinxing Zhou
Ziheng Zhou
Yanghao Zhou
Yuxin Mao
Zhangling Duan
Dan Guo
100
1
0
06 Aug 2025
GRAM: Spatial general-purpose audio representation models for real-world applications
Goksenin Yuksel
Marcel van Gerven
Kiki van der Heijden
194
1
0
01 Jun 2025
PreFM: Online Audio-Visual Event Parsing via Predictive Future Modeling
X. Yu
Yan Fang
Xiaojie Jin
Yao Zhao
Yunchao Wei
231
1
0
29 May 2025
Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization
Sooyoung Park
Arda Senocak
Joon Son Chung
VLM
229
0
0
08 May 2025
Adapting to the Unknown: Training-Free Audio-Visual Event Perception with Dynamic Thresholds
Computer Vision and Pattern Recognition (CVPR), 2025
E. Shaar
Ariel Shaulov
Gal Chechik
Lior Wolf
VLM
283
1
0
17 Mar 2025
Towards Open-Vocabulary Audio-Visual Event Localization
Computer Vision and Pattern Recognition (CVPR), 2024
Jinxing Zhou
Dan Guo
Ruohao Guo
Yuxin Mao
Jingjing Hu
Yiran Zhong
Xiaojun Chang
Ming Wang
VLM
439
19
0
18 Nov 2024
SaSR-Net: Source-Aware Semantic Representation Network for Enhancing Audio-Visual Question Answering
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Tianyu Yang
Yiyang Nan
Lisen Dai
Zhenwen Liang
Yapeng Tian
Wei Wei
248
1
0
07 Nov 2024
CACE-Net: Co-guidance Attention and Contrastive Enhancement for Effective Audio-Visual Event Localization
ACM Multimedia (MM), 2024
Xiang He
Xiangxi Liu
Yang Li
Dongcheng Zhao
Guobin Shen
Qingqun Kong
Xin Yang
Yi Zeng
212
12
0
04 Aug 2024
MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual Transformers
Tanvir Mahmud
Shentong Mo
Yapeng Tian
Diana Marculescu
142
7
0
07 Jun 2024
Advancing Weakly-Supervised Audio-Visual Video Parsing via Segment-wise Pseudo Labeling
Jinxing Zhou
Dan Guo
Yiran Zhong
Meng Wang
VLM
215
33
0
03 Jun 2024
OmniBind: Teach to Build Unequal-Scale Modality Interaction for Omni-Bind of All
Yuanhuiyi Lyu
Xueye Zheng
Dahun Kim
Lin Wang
216
20
0
25 May 2024
Siamese Vision Transformers are Scalable Audio-visual Learners
Yan-Bo Lin
Gedas Bertasius
203
10
0
28 Mar 2024
UniBind: LLM-Augmented Unified and Balanced Representation Space to Bind Them All
Yuanhuiyi Lyu
Xueye Zheng
Jiazhou Zhou
Lin Wang
196
39
0
19 Mar 2024
Audio-Visual Segmentation via Unlabeled Frame Exploitation
Jinxiang Liu
Yikun Liu
Fei Zhang
Chen Ju
Ya Zhang
Yanfeng Wang
251
25
0
17 Mar 2024
Image Anything: Towards Reasoning-coherent and Training-free Multi-modal Image Generation
Yuanhuiyi Lyu
Xueye Zheng
Lin Wang
DiffM
185
12
0
31 Jan 2024
Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual Downstream Tasks
Haoyi Duan
Yan Xia
Mingze Zhou
Li Tang
Jieming Zhu
Zhou Zhao
VLM
258
38
0
09 Nov 2023
Can CLIP Help Sound Source Localization?
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Sooyoung Park
Arda Senocak
Joon Son Chung
145
15
0
07 Nov 2023
EventBind: Learning a Unified Representation to Bind Them All for Event-based Open-world Understanding
European Conference on Computer Vision (ECCV), 2023
Jiazhou Zhou
Xueye Zheng
Yuanhuiyi Lyu
Lin Wang
VLM
305
26
0
06 Aug 2023
Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective
Neural Information Processing Systems (NeurIPS), 2023
Yingying Fan
Yu Wu
Bo Du
Yutian Lin
241
17
0
01 Jun 2023
Improving Audio-Visual Video Parsing with Pseudo Visual Labels
Jinxing Zhou
Dan Guo
Yiran Zhong
Meng Wang
VLM
195
21
0
04 Mar 2023
Vision Transformers are Parameter-Efficient Audio-Visual Learners
Computer Vision and Pattern Recognition (CVPR), 2022
Yan-Bo Lin
Yi-Lin Sung
Jie Lei
Joey Tianyi Zhou
Gedas Bertasius
268
106
0
15 Dec 2022
Leveraging the Video-level Semantic Consistency of Event for Audio-visual Event Localization
IEEE transactions on multimedia (IEEE TMM), 2022
Yuanyuan Jiang
Jianqin Yin
Yonghao Dang
106
14
0
11 Oct 2022
Xception: Deep Learning with Depthwise Separable Convolutions
Computer Vision and Pattern Recognition (CVPR), 2016
François Chollet
MDE
BDL
PINN
2.5K
16,498
0
07 Oct 2016
1