Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2001.08740
Cited By
Audiovisual SlowFast Networks for Video Recognition
23 January 2020
Fanyi Xiao
Yong Jae Lee
Kristen Grauman
Jitendra Malik
Christoph Feichtenhofer
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Audiovisual SlowFast Networks for Video Recognition"
23 / 23 papers shown
Title
Multimodal Emotion Recognition using Audio-Video Transformer Fusion with Cross Attention
Joe Dhanith
Shravan Venkatraman
Modigari Narendra
Vigya Sharma
Santhosh Malarvannan
69
0
0
20 Feb 2025
NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative
Asmar Nadeem
Faegheh Sardari
R. Dawes
Syed Sameed Husain
Adrian Hilton
Armin Mustafa
44
4
0
10 Jun 2024
AudioRepInceptionNeXt: A lightweight single-stream architecture for efficient audio recognition
Kin Wai Lau
Yasar Abbas Ur Rehman
L. Po
14
1
0
21 Apr 2024
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization
Anna Kukleva
Fadime Sener
Edoardo Remelli
Bugra Tekin
Eric Sauser
Bernt Schiele
Shugao Ma
VLM
EgoV
23
1
0
28 Mar 2024
Multimodal Action Quality Assessment
Ling-an Zeng
Wei-Shi Zheng
40
11
0
31 Jan 2024
Collaboratively Self-supervised Video Representation Learning for Action Recognition
Jie M. Zhang
Zhifan Wan
Lanqing Hu
Stephen Lin
Shuzhe Wu
Shiguang Shan
TTA
49
0
0
15 Jan 2024
OmniVec: Learning robust representations with cross modal sharing
Siddharth Srivastava
Gaurav Sharma
SSL
13
64
0
07 Nov 2023
SkeleTR: Towrads Skeleton-based Action Recognition in the Wild
Haodong Duan
Mingze Xu
Bing Shuai
Davide Modolo
Zhuowen Tu
Joseph Tighe
Alessandro Bergamo
ViT
23
1
0
20 Sep 2023
ARGUS: Visualization of AI-Assisted Task Guidance in AR
Sonia Castelo
Joao Rulff
Erin McGowan
Bea Steers
Guande Wu
...
Qinghong Sun
Huy Q. Vo
J. P. Bello
M. Krone
Claudio Silva
14
18
0
11 Aug 2023
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Nikhil Singh
Chih-Wei Wu
Iroro Orife
Mahdi M. Kalayeh
20
2
0
12 Apr 2023
Decomposed Cross-modal Distillation for RGB-based Temporal Action Detection
Pilhyeon Lee
Taeoh Kim
Minho Shim
Dongyoon Wee
H. Byun
10
11
0
30 Mar 2023
Look, Listen, and Attack: Backdoor Attacks Against Video Action Recognition
Hasan Hammoud
Shuming Liu
Mohammad Alkhrashi
Fahad Albalawi
Bernard Ghanem
AAML
13
8
0
03 Jan 2023
A Survey on Human Action Recognition
Zhou Shuchang
16
0
0
20 Dec 2022
PMR: Prototypical Modal Rebalance for Multimodal Learning
Yunfeng Fan
Wenchao Xu
Haozhao Wang
Junxiao Wang
Song Guo
16
60
0
14 Nov 2022
Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
Grant Van Horn
Rui Qian
Kimberly Wilber
Hartwig Adam
Oisin Mac Aodha
Serge J. Belongie
9
10
0
21 Jul 2022
Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Zaid Khan
B. Vijaykumar
Xiang Yu
S. Schulter
Manmohan Chandraker
Y. Fu
CLIP
VLM
17
16
0
27 Mar 2022
Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization
Hao Jiang
Calvin Murdock
V. Ithapu
EgoV
17
40
0
06 Jan 2022
Perceiver: General Perception with Iterative Attention
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLM
ViT
MDE
13
970
0
04 Mar 2021
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
Francisco Rivera Valverde
Juana Valeria Hurtado
Abhinav Valada
15
72
0
01 Mar 2021
ECO: Efficient Convolutional Network for Online Video Understanding
Mohammadreza Zolfaghari
Kamaljeet Singh
Thomas Brox
116
495
0
24 Apr 2018
Lip Reading Sentences in the Wild
Joon Son Chung
A. Senior
Oriol Vinyals
Andrew Zisserman
151
782
0
16 Nov 2016
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
139
1,458
0
06 Jun 2016
Improving neural networks by preventing co-adaptation of feature detectors
Geoffrey E. Hinton
Nitish Srivastava
A. Krizhevsky
Ilya Sutskever
Ruslan Salakhutdinov
VLM
237
7,597
0
03 Jul 2012
1