ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2004.04730
  4. Cited By
X3D: Expanding Architectures for Efficient Video Recognition

X3D: Expanding Architectures for Efficient Video Recognition

9 April 2020
Christoph Feichtenhofer
ArXivPDFHTML

Papers citing "X3D: Expanding Architectures for Efficient Video Recognition"

50 / 526 papers shown
Title
Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video
  Retrieval
Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval
Fan Hu
Aozhu Chen
Ziyu Wang
Fangming Zhou
Jianfeng Dong
Xirong Li
14
28
0
03 Dec 2021
BEVT: BERT Pretraining of Video Transformers
BEVT: BERT Pretraining of Video Transformers
Rui Wang
Dongdong Chen
Zuxuan Wu
Yinpeng Chen
Xiyang Dai
Mengchen Liu
Yu-Gang Jiang
Luowei Zhou
Lu Yuan
ViT
12
202
0
02 Dec 2021
MViTv2: Improved Multiscale Vision Transformers for Classification and
  Detection
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
Yanghao Li
Chaoxia Wu
Haoqi Fan
K. Mangalam
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
20
671
0
02 Dec 2021
The Augmented Image Prior: Distilling 1000 Classes by Extrapolating from
  a Single Image
The Augmented Image Prior: Distilling 1000 Classes by Extrapolating from a Single Image
Yuki M. Asano
Aaqib Saeed
11
7
0
01 Dec 2021
Adaptive Token Sampling For Efficient Vision Transformers
Adaptive Token Sampling For Efficient Vision Transformers
Mohsen Fayyaz
Soroush Abbasi Koohpayegani
F. Jafari
Sunando Sengupta
Hamid Reza Vaezi Joze
Eric Sommerlade
Hamed Pirsiavash
Juergen Gall
ViT
8
100
0
30 Nov 2021
Weakly-guided Self-supervised Pretraining for Temporal Activity
  Detection
Weakly-guided Self-supervised Pretraining for Temporal Activity Detection
Kumara Kahatapitiya
Zhou Ren
Haoxiang Li
Zhenyu Wu
Michael S. Ryoo
G. Hua
ViT
18
6
0
26 Nov 2021
Learning from Temporal Gradient for Semi-supervised Action Recognition
Learning from Temporal Gradient for Semi-supervised Action Recognition
Junfei Xiao
Longlong Jing
Lin Zhang
Ju He
Qi She
Zongwei Zhou
Alan Yuille
Yingwei Li
4
49
0
25 Nov 2021
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
Valerii Likhosherstov
Anurag Arnab
K. Choromanski
Mario Lucic
Yi Tay
Adrian Weller
Mostafa Dehghani
ViT
23
73
0
25 Nov 2021
MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal
  Representation Learning
MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning
David Junhao Zhang
Kunchang Li
Yali Wang
Yuxiang Chen
Shashwat Chandra
Yu Qiao
Luoqi Liu
Mike Zheng Shou
AI4TS
11
30
0
24 Nov 2021
Efficient Video Transformers with Spatial-Temporal Token Selection
Efficient Video Transformers with Spatial-Temporal Token Selection
Junke Wang
Xitong Yang
Hengduo Li
Li Liu
Zuxuan Wu
Yu-Gang Jiang
ViT
6
63
0
23 Nov 2021
PyTorchVideo: A Deep Learning Library for Video Understanding
PyTorchVideo: A Deep Learning Library for Video Understanding
Haoqi Fan
Tullie Murrell
Heng Wang
Kalyan Vasudev Alwala
Yanghao Li
...
Ross B. Girshick
Matt Feiszli
Aaron B. Adcock
Wan-Yen Lo
Christoph Feichtenhofer
VLM
ViT
18
49
0
18 Nov 2021
Evaluating Transformers for Lightweight Action Recognition
Evaluating Transformers for Lightweight Action Recognition
Raivo Koot
Markus Hennerbichler
Haiping Lu
ViT
14
8
0
18 Nov 2021
Sequence-to-Sequence Modeling for Action Identification at High Temporal
  Resolution
Sequence-to-Sequence Modeling for Action Identification at High Temporal Resolution
Aakash Kaku
Kangning Liu
A. Parnandi
H. Rajamohan
Kannan Venkataramanan
Anita Venkatesan
Audre Wirtanen
Natasha Pandit
Heidi M. Schambra
C. Fernandez‐Granda
6
5
0
03 Nov 2021
Relational Self-Attention: What's Missing in Attention for Video
  Understanding
Relational Self-Attention: What's Missing in Attention for Video Understanding
Manjin Kim
Heeseung Kwon
Chunyu Wang
Suha Kwak
Minsu Cho
ViT
14
28
0
02 Nov 2021
Contrast and Mix: Temporal Contrastive Video Domain Adaptation with
  Background Mixing
Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing
Aadarsh Sahoo
Rutav Shah
Rameswar Panda
Kate Saenko
Abir Das
20
63
0
28 Oct 2021
Temporal-attentive Covariance Pooling Networks for Video Recognition
Temporal-attentive Covariance Pooling Networks for Video Recognition
Zilin Gao
Qilong Wang
Bingbing Zhang
Q. Hu
P. Li
4
24
0
27 Oct 2021
Leveraging Local Temporal Information for Multimodal Scene
  Classification
Leveraging Local Temporal Information for Multimodal Scene Classification
Saurabh Sahu
Palash Goyal
ViT
15
0
0
26 Oct 2021
GTM: Gray Temporal Model for Video Recognition
GTM: Gray Temporal Model for Video Recognition
Yanping Zhang
Yongxin Yu
17
0
0
20 Oct 2021
"Knights": First Place Submission for VIPriors21 Action Recognition
  Challenge at ICCV 2021
"Knights": First Place Submission for VIPriors21 Action Recognition Challenge at ICCV 2021
Ishan R. Dave
Naman Biyani
Brandon Clark
Rohit Gupta
Y. S. Rawat
M. Shah
ViT
8
3
0
14 Oct 2021
Benchmarking the Robustness of Spatial-Temporal Models Against
  Corruptions
Benchmarking the Robustness of Spatial-Temporal Models Against Corruptions
Chenyu Yi
Siyuan Yang
Haoliang Li
Yap-Peng Tan
Alex C. Kot
10
31
0
13 Oct 2021
TAda! Temporally-Adaptive Convolutions for Video Understanding
TAda! Temporally-Adaptive Convolutions for Video Understanding
Ziyuan Huang
Shiwei Zhang
Liang Pan
Zhiwu Qing
Mingqian Tang
Ziwei Liu
M. Ang
29
49
0
12 Oct 2021
Video Is Graph: Structured Graph Module for Video Action Recognition
Video Is Graph: Structured Graph Module for Video Action Recognition
Rongjie Li
Xiaojun Wu
Tianyang Xu
22
12
0
12 Oct 2021
Joint Learning On The Hierarchy Representation for Fine-Grained Human
  Action Recognition
Joint Learning On The Hierarchy Representation for Fine-Grained Human Action Recognition
M. C. Leong
Hui Li Tan
Haosong Zhang
Liyuan Li
Feng Lin
J. Lim
19
10
0
12 Oct 2021
Hierarchical Modeling for Task Recognition and Action Segmentation in
  Weakly-Labeled Instructional Videos
Hierarchical Modeling for Task Recognition and Action Segmentation in Weakly-Labeled Instructional Videos
Reza Ghoddoosian
S. Sayed
V. Athitsos
14
15
0
12 Oct 2021
Towards Streaming Egocentric Action Anticipation
Towards Streaming Egocentric Action Anticipation
Antonino Furnari
G. Farinella
EgoV
12
6
0
11 Oct 2021
TSM: Temporal Shift Module for Efficient and Scalable Video
  Understanding on Edge Device
TSM: Temporal Shift Module for Efficient and Scalable Video Understanding on Edge Device
Ji Lin
Chuang Gan
Kuan-Chieh Jackson Wang
Song Han
32
64
0
27 Sep 2021
ActionCLIP: A New Paradigm for Video Action Recognition
ActionCLIP: A New Paradigm for Video Action Recognition
Mengmeng Wang
Jiazheng Xing
Yong Liu
VLM
141
261
0
17 Sep 2021
Overview of Tencent Multi-modal Ads Video Understanding Challenge
Overview of Tencent Multi-modal Ads Video Understanding Challenge
Zhenzhi Wang
Liyu Wu
Zhimin Li
Jiangfeng Xiong
Qinglin Lu
11
4
0
16 Sep 2021
Efficient Action Recognition Using Confidence Distillation
Efficient Action Recognition Using Confidence Distillation
Shervin Manzuri Shalmani
Fei Chiang
Ronghuo Zheng
4
6
0
05 Sep 2021
Revisiting 3D ResNets for Video Recognition
Revisiting 3D ResNets for Video Recognition
Xianzhi Du
Yeqing Li
Yin Cui
Rui Qian
Jing Li
Irwan Bello
43
17
0
03 Sep 2021
LIGAR: Lightweight General-purpose Action Recognition
LIGAR: Lightweight General-purpose Action Recognition
Evgeny Izutov
8
3
0
30 Aug 2021
Searching for Two-Stream Models in Multivariate Space for Video
  Recognition
Searching for Two-Stream Models in Multivariate Space for Video Recognition
Xinyu Gong
Heng Wang
Zheng Shou
Matt Feiszli
Zhangyang Wang
Zhicheng Yan
14
9
0
30 Aug 2021
Shifted Chunk Transformer for Spatio-Temporal Representational Learning
Shifted Chunk Transformer for Spatio-Temporal Representational Learning
Xuefan Zha
Wentao Zhu
Tingxun Lv
Sen Yang
Ji Liu
AI4TS
ViT
28
26
0
26 Aug 2021
Identity-aware Graph Memory Network for Action Detection
Identity-aware Graph Memory Network for Action Detection
Jingcheng Ni
Jie Qin
Di Huang
10
9
0
26 Aug 2021
Dynamic Network Quantization for Efficient Video Inference
Dynamic Network Quantization for Efficient Video Inference
Ximeng Sun
Rameswar Panda
Chun-Fu Chen
A. Oliva
Rogerio Feris
Kate Saenko
18
45
0
23 Aug 2021
MobileStereoNet: Towards Lightweight Deep Networks for Stereo Matching
MobileStereoNet: Towards Lightweight Deep Networks for Stereo Matching
Faranak Shamsafar
Samuel Woerz
Rafia Rahim
A. Zell
3DV
17
85
0
22 Aug 2021
MM-ViT: Multi-Modal Video Transformer for Compressed Video Action
  Recognition
MM-ViT: Multi-Modal Video Transformer for Compressed Video Action Recognition
Jiawei Chen
C. Ho
ViT
11
61
0
20 Aug 2021
Elaborative Rehearsal for Zero-shot Action Recognition
Elaborative Rehearsal for Zero-shot Action Recognition
Shizhe Chen
Dong Huang
VLM
14
79
0
05 Aug 2021
Token Shift Transformer for Video Classification
Token Shift Transformer for Video Classification
Hao Zhang
Y. Hao
Chong-Wah Ngo
ViT
18
115
0
05 Aug 2021
UNIK: A Unified Framework for Real-world Skeleton-based Action
  Recognition
UNIK: A Unified Framework for Real-world Skeleton-based Action Recognition
Di Yang
Yaohui Wang
A. Dantcheva
Lorenzo Garattoni
Gianpiero Francesca
F. Brémond
13
38
0
19 Jul 2021
VideoLightFormer: Lightweight Action Recognition using Transformers
Raivo Koot
Haiping Lu
ViT
6
5
0
01 Jul 2021
When Video Classification Meets Incremental Classes
When Video Classification Meets Incremental Classes
Hanbin Zhao
Xin Qin
Shihao Su
Yongjian Fu
Zibo Lin
Xi Li
CLL
16
23
0
30 Jun 2021
Can An Image Classifier Suffice For Action Recognition?
Can An Image Classifier Suffice For Action Recognition?
Quanfu Fan
Chun-Fu Chen
Chen
Rameswar Panda
ViT
19
24
0
26 Jun 2021
Video Swin Transformer
Video Swin Transformer
Ze Liu
Jia Ning
Yue Cao
Yixuan Wei
Zheng-Wei Zhang
Stephen Lin
Han Hu
ViT
27
1,426
0
24 Jun 2021
Towards Long-Form Video Understanding
Towards Long-Form Video Understanding
Chaoxia Wu
Philipp Krahenbuhl
VLM
ViT
28
164
0
21 Jun 2021
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
Michael S. Ryoo
A. Piergiovanni
Anurag Arnab
Mostafa Dehghani
A. Angelova
ViT
9
127
0
21 Jun 2021
Long-Short Temporal Contrastive Learning of Video Transformers
Long-Short Temporal Contrastive Learning of Video Transformers
Jue Wang
Gedas Bertasius
Du Tran
Lorenzo Torresani
VLM
ViT
8
50
0
17 Jun 2021
Gradient Forward-Propagation for Large-Scale Temporal Video Modelling
Gradient Forward-Propagation for Large-Scale Temporal Video Modelling
Mateusz Malinowski
Dimitrios Vytiniotis
G. Swirszcz
Viorica Patraucean
João Carreira
11
6
0
15 Jun 2021
Space-time Mixing Attention for Video Transformer
Space-time Mixing Attention for Video Transformer
Adrian Bulat
Juan-Manuel Perez-Rua
Swathikiran Sudhakaran
Brais Martínez
Georgios Tzimiropoulos
ViT
11
112
0
10 Jun 2021
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Mandela Patrick
Dylan Campbell
Yuki M. Asano
Ishan Misra
Ishan Misra Florian Metze
Christoph Feichtenhofer
Andrea Vedaldi
João F. Henriques
8
230
0
09 Jun 2021
Previous
123...101189
Next