Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2108.09322
Cited By
MM-ViT: Multi-Modal Video Transformer for Compressed Video Action Recognition
20 August 2021
Jiawei Chen
C. Ho
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MM-ViT: Multi-Modal Video Transformer for Compressed Video Action Recognition"
15 / 15 papers shown
Title
NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative
Asmar Nadeem
Faegheh Sardari
R. Dawes
Syed Sameed Husain
Adrian Hilton
Armin Mustafa
44
4
0
10 Jun 2024
Towards Robust Multimodal Prompting With Missing Modalities
Jaehyuk Jang
Yooseung Wang
Changick Kim
VLM
17
10
0
26 Dec 2023
HIG: Hierarchical Interlacement Graph Approach to Scene Graph Generation in Video Understanding
Trong-Thuan Nguyen
Pha Nguyen
Khoa Luu
10
12
0
05 Dec 2023
Deep Neural Networks in Video Human Action Recognition: A Review
Zihan Wang
Yang Yang
Zhi Liu
Y. Zheng
38
4
0
25 May 2023
Multi-view knowledge distillation transformer for human action recognition
Yi Lin
Vincent S. Tseng
ViT
8
1
0
25 Mar 2023
Towards Continual Egocentric Activity Recognition: A Multi-modal Egocentric Activity Dataset for Continual Learning
Linfeng Xu
Qingbo Wu
Lili Pan
Fanman Meng
Hongliang Li
Chiyuan He
Hanxin Wang
Shaoxu Cheng
Yunshu Dai
EgoV
HAI
17
23
0
26 Jan 2023
A Survey on Human Action Recognition
Zhou Shuchang
16
0
0
20 Dec 2022
TransDARC: Transformer-based Driver Activity Recognition with Latent Space Feature Calibration
Kunyu Peng
Alina Roitberg
Kailun Yang
Jiaming Zhang
Rainer Stiefelhagen
ViT
21
32
0
02 Mar 2022
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
Huaishao Luo
Lei Ji
Ming Zhong
Yang Chen
Wen Lei
Nan Duan
Tianrui Li
CLIP
VLM
301
771
0
18 Apr 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
275
1,939
0
09 Feb 2021
Video Transformer Network
Daniel Neimark
Omri Bar
Maya Zohar
Dotan Asselmann
ViT
191
375
0
01 Feb 2021
Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers
Lisa Anne Hendricks
John F. J. Mellor
R. Schneider
Jean-Baptiste Alayrac
Aida Nematzadeh
75
110
0
31 Jan 2021
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
396
532
0
21 Jul 2020
Skeleton-based Action Recognition Using LSTM and CNN
Chuankun Li
Pichao Wang
Shuang Wang
Yonghong Hou
W. Li
HAI
26
162
0
06 Jul 2017
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
139
1,458
0
06 Jun 2016
1