Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2004.04730
Cited By
X3D: Expanding Architectures for Efficient Video Recognition
9 April 2020
Christoph Feichtenhofer
Re-assign community
ArXiv
PDF
HTML
Papers citing
"X3D: Expanding Architectures for Efficient Video Recognition"
50 / 526 papers shown
Title
Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval
Fan Hu
Aozhu Chen
Ziyu Wang
Fangming Zhou
Jianfeng Dong
Xirong Li
14
28
0
03 Dec 2021
BEVT: BERT Pretraining of Video Transformers
Rui Wang
Dongdong Chen
Zuxuan Wu
Yinpeng Chen
Xiyang Dai
Mengchen Liu
Yu-Gang Jiang
Luowei Zhou
Lu Yuan
ViT
12
202
0
02 Dec 2021
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
Yanghao Li
Chaoxia Wu
Haoqi Fan
K. Mangalam
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
20
671
0
02 Dec 2021
The Augmented Image Prior: Distilling 1000 Classes by Extrapolating from a Single Image
Yuki M. Asano
Aaqib Saeed
11
7
0
01 Dec 2021
Adaptive Token Sampling For Efficient Vision Transformers
Mohsen Fayyaz
Soroush Abbasi Koohpayegani
F. Jafari
Sunando Sengupta
Hamid Reza Vaezi Joze
Eric Sommerlade
Hamed Pirsiavash
Juergen Gall
ViT
8
100
0
30 Nov 2021
Weakly-guided Self-supervised Pretraining for Temporal Activity Detection
Kumara Kahatapitiya
Zhou Ren
Haoxiang Li
Zhenyu Wu
Michael S. Ryoo
G. Hua
ViT
18
6
0
26 Nov 2021
Learning from Temporal Gradient for Semi-supervised Action Recognition
Junfei Xiao
Longlong Jing
Lin Zhang
Ju He
Qi She
Zongwei Zhou
Alan Yuille
Yingwei Li
4
49
0
25 Nov 2021
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
Valerii Likhosherstov
Anurag Arnab
K. Choromanski
Mario Lucic
Yi Tay
Adrian Weller
Mostafa Dehghani
ViT
23
73
0
25 Nov 2021
MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning
David Junhao Zhang
Kunchang Li
Yali Wang
Yuxiang Chen
Shashwat Chandra
Yu Qiao
Luoqi Liu
Mike Zheng Shou
AI4TS
11
30
0
24 Nov 2021
Efficient Video Transformers with Spatial-Temporal Token Selection
Junke Wang
Xitong Yang
Hengduo Li
Li Liu
Zuxuan Wu
Yu-Gang Jiang
ViT
6
63
0
23 Nov 2021
PyTorchVideo: A Deep Learning Library for Video Understanding
Haoqi Fan
Tullie Murrell
Heng Wang
Kalyan Vasudev Alwala
Yanghao Li
...
Ross B. Girshick
Matt Feiszli
Aaron B. Adcock
Wan-Yen Lo
Christoph Feichtenhofer
VLM
ViT
18
49
0
18 Nov 2021
Evaluating Transformers for Lightweight Action Recognition
Raivo Koot
Markus Hennerbichler
Haiping Lu
ViT
14
8
0
18 Nov 2021
Sequence-to-Sequence Modeling for Action Identification at High Temporal Resolution
Aakash Kaku
Kangning Liu
A. Parnandi
H. Rajamohan
Kannan Venkataramanan
Anita Venkatesan
Audre Wirtanen
Natasha Pandit
Heidi M. Schambra
C. Fernandez‐Granda
6
5
0
03 Nov 2021
Relational Self-Attention: What's Missing in Attention for Video Understanding
Manjin Kim
Heeseung Kwon
Chunyu Wang
Suha Kwak
Minsu Cho
ViT
14
28
0
02 Nov 2021
Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing
Aadarsh Sahoo
Rutav Shah
Rameswar Panda
Kate Saenko
Abir Das
20
63
0
28 Oct 2021
Temporal-attentive Covariance Pooling Networks for Video Recognition
Zilin Gao
Qilong Wang
Bingbing Zhang
Q. Hu
P. Li
4
24
0
27 Oct 2021
Leveraging Local Temporal Information for Multimodal Scene Classification
Saurabh Sahu
Palash Goyal
ViT
15
0
0
26 Oct 2021
GTM: Gray Temporal Model for Video Recognition
Yanping Zhang
Yongxin Yu
17
0
0
20 Oct 2021
"Knights": First Place Submission for VIPriors21 Action Recognition Challenge at ICCV 2021
Ishan R. Dave
Naman Biyani
Brandon Clark
Rohit Gupta
Y. S. Rawat
M. Shah
ViT
8
3
0
14 Oct 2021
Benchmarking the Robustness of Spatial-Temporal Models Against Corruptions
Chenyu Yi
Siyuan Yang
Haoliang Li
Yap-Peng Tan
Alex C. Kot
10
31
0
13 Oct 2021
TAda! Temporally-Adaptive Convolutions for Video Understanding
Ziyuan Huang
Shiwei Zhang
Liang Pan
Zhiwu Qing
Mingqian Tang
Ziwei Liu
M. Ang
29
49
0
12 Oct 2021
Video Is Graph: Structured Graph Module for Video Action Recognition
Rongjie Li
Xiaojun Wu
Tianyang Xu
22
12
0
12 Oct 2021
Joint Learning On The Hierarchy Representation for Fine-Grained Human Action Recognition
M. C. Leong
Hui Li Tan
Haosong Zhang
Liyuan Li
Feng Lin
J. Lim
19
10
0
12 Oct 2021
Hierarchical Modeling for Task Recognition and Action Segmentation in Weakly-Labeled Instructional Videos
Reza Ghoddoosian
S. Sayed
V. Athitsos
14
15
0
12 Oct 2021
Towards Streaming Egocentric Action Anticipation
Antonino Furnari
G. Farinella
EgoV
12
6
0
11 Oct 2021
TSM: Temporal Shift Module for Efficient and Scalable Video Understanding on Edge Device
Ji Lin
Chuang Gan
Kuan-Chieh Jackson Wang
Song Han
32
64
0
27 Sep 2021
ActionCLIP: A New Paradigm for Video Action Recognition
Mengmeng Wang
Jiazheng Xing
Yong Liu
VLM
141
261
0
17 Sep 2021
Overview of Tencent Multi-modal Ads Video Understanding Challenge
Zhenzhi Wang
Liyu Wu
Zhimin Li
Jiangfeng Xiong
Qinglin Lu
11
4
0
16 Sep 2021
Efficient Action Recognition Using Confidence Distillation
Shervin Manzuri Shalmani
Fei Chiang
Ronghuo Zheng
4
6
0
05 Sep 2021
Revisiting 3D ResNets for Video Recognition
Xianzhi Du
Yeqing Li
Yin Cui
Rui Qian
Jing Li
Irwan Bello
43
17
0
03 Sep 2021
LIGAR: Lightweight General-purpose Action Recognition
Evgeny Izutov
8
3
0
30 Aug 2021
Searching for Two-Stream Models in Multivariate Space for Video Recognition
Xinyu Gong
Heng Wang
Zheng Shou
Matt Feiszli
Zhangyang Wang
Zhicheng Yan
14
9
0
30 Aug 2021
Shifted Chunk Transformer for Spatio-Temporal Representational Learning
Xuefan Zha
Wentao Zhu
Tingxun Lv
Sen Yang
Ji Liu
AI4TS
ViT
28
26
0
26 Aug 2021
Identity-aware Graph Memory Network for Action Detection
Jingcheng Ni
Jie Qin
Di Huang
10
9
0
26 Aug 2021
Dynamic Network Quantization for Efficient Video Inference
Ximeng Sun
Rameswar Panda
Chun-Fu Chen
A. Oliva
Rogerio Feris
Kate Saenko
18
45
0
23 Aug 2021
MobileStereoNet: Towards Lightweight Deep Networks for Stereo Matching
Faranak Shamsafar
Samuel Woerz
Rafia Rahim
A. Zell
3DV
17
85
0
22 Aug 2021
MM-ViT: Multi-Modal Video Transformer for Compressed Video Action Recognition
Jiawei Chen
C. Ho
ViT
11
61
0
20 Aug 2021
Elaborative Rehearsal for Zero-shot Action Recognition
Shizhe Chen
Dong Huang
VLM
14
79
0
05 Aug 2021
Token Shift Transformer for Video Classification
Hao Zhang
Y. Hao
Chong-Wah Ngo
ViT
18
115
0
05 Aug 2021
UNIK: A Unified Framework for Real-world Skeleton-based Action Recognition
Di Yang
Yaohui Wang
A. Dantcheva
Lorenzo Garattoni
Gianpiero Francesca
F. Brémond
13
38
0
19 Jul 2021
VideoLightFormer: Lightweight Action Recognition using Transformers
Raivo Koot
Haiping Lu
ViT
6
5
0
01 Jul 2021
When Video Classification Meets Incremental Classes
Hanbin Zhao
Xin Qin
Shihao Su
Yongjian Fu
Zibo Lin
Xi Li
CLL
16
23
0
30 Jun 2021
Can An Image Classifier Suffice For Action Recognition?
Quanfu Fan
Chun-Fu Chen
Chen
Rameswar Panda
ViT
19
24
0
26 Jun 2021
Video Swin Transformer
Ze Liu
Jia Ning
Yue Cao
Yixuan Wei
Zheng-Wei Zhang
Stephen Lin
Han Hu
ViT
27
1,426
0
24 Jun 2021
Towards Long-Form Video Understanding
Chaoxia Wu
Philipp Krahenbuhl
VLM
ViT
28
164
0
21 Jun 2021
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
Michael S. Ryoo
A. Piergiovanni
Anurag Arnab
Mostafa Dehghani
A. Angelova
ViT
9
127
0
21 Jun 2021
Long-Short Temporal Contrastive Learning of Video Transformers
Jue Wang
Gedas Bertasius
Du Tran
Lorenzo Torresani
VLM
ViT
8
50
0
17 Jun 2021
Gradient Forward-Propagation for Large-Scale Temporal Video Modelling
Mateusz Malinowski
Dimitrios Vytiniotis
G. Swirszcz
Viorica Patraucean
João Carreira
11
6
0
15 Jun 2021
Space-time Mixing Attention for Video Transformer
Adrian Bulat
Juan-Manuel Perez-Rua
Swathikiran Sudhakaran
Brais Martínez
Georgios Tzimiropoulos
ViT
11
112
0
10 Jun 2021
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Mandela Patrick
Dylan Campbell
Yuki M. Asano
Ishan Misra
Ishan Misra Florian Metze
Christoph Feichtenhofer
Andrea Vedaldi
João F. Henriques
8
230
0
09 Jun 2021
Previous
1
2
3
...
10
11
8
9
Next