Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2201.04676
Cited By
UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning
12 January 2022
Kunchang Li
Yali Wang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning"
41 / 141 papers shown
Title
Degenerate Swin to Win: Plain Window-based Transformer without Sophisticated Operations
Tan Yu
Ping Li
ViT
25
5
0
25 Nov 2022
SVFormer: Semi-supervised Video Transformer for Action Recognition
Zhen Xing
Qi Dai
Hang-Rui Hu
Jingjing Chen
Zuxuan Wu
Yu-Gang Jiang
ViT
19
67
0
23 Nov 2022
Vision Transformer with Super Token Sampling
Huaibo Huang
Xiaoqiang Zhou
Jie Cao
Ran He
T. Tan
ViT
11
54
0
21 Nov 2022
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
Limin Wang
Yu Qiao
ViT
15
106
0
17 Nov 2022
ParCNetV2: Oversized Kernel with Enhanced Attention
Ruihan Xu
Haokui Zhang
Wenze Hu
Shiliang Zhang
Xiaoyu Wang
ViT
9
6
0
14 Nov 2022
Learning a Condensed Frame for Memory-Efficient Video Class-Incremental Learning
Yixuan Pei
Zhiwu Qing
Jun Cen
Xiang Wang
Shiwei Zhang
Yaxiong Wang
Mingqian Tang
Nong Sang
Xueming Qian
16
13
0
02 Nov 2022
MetaFormer Baselines for Vision
Weihao Yu
Chenyang Si
Pan Zhou
Mi Luo
Yichen Zhou
Jiashi Feng
Shuicheng Yan
Xinchao Wang
MoE
10
155
0
24 Oct 2022
Low-Resolution Action Recognition for Tiny Actions Challenge
Bo-You Chen
Yu Qiao
Yali Wang
19
0
0
28 Sep 2022
On the Surprising Effectiveness of Transformers in Low-Labeled Video Recognition
Farrukh Rahman
Ömer Mubarek
Z. Kira
ViT
10
2
0
15 Sep 2022
EchoCoTr: Estimation of the Left Ventricular Ejection Fraction from Spatiotemporal Echocardiography
Rand Muhtaseb
Mohammad Yaqub
ViT
11
24
0
09 Sep 2022
Video Mobile-Former: Video Recognition with Efficient Global Spatial-temporal Modeling
Rui Wang
Zuxuan Wu
Dongdong Chen
Yinpeng Chen
Xiyang Dai
Mengchen Liu
Luowei Zhou
Lu Yuan
Yu-Gang Jiang
ViT
27
4
0
25 Aug 2022
Frozen CLIP Models are Efficient Video Learners
Ziyi Lin
Shijie Geng
Renrui Zhang
Peng Gao
Gerard de Melo
Xiaogang Wang
Jifeng Dai
Yu Qiao
Hongsheng Li
CLIP
VLM
10
199
0
06 Aug 2022
MAR: Masked Autoencoders for Efficient Action Recognition
Zhiwu Qing
Shiwei Zhang
Ziyuan Huang
Xiang Wang
Yuehuang Wang
Yiliang Lv
Changxin Gao
Nong Sang
13
42
0
24 Jul 2022
An Efficient Spatio-Temporal Pyramid Transformer for Action Detection
Yuetian Weng
Zizheng Pan
Mingfei Han
Xiaojun Chang
Bohan Zhuang
ViT
6
25
0
21 Jul 2022
Time Is MattEr: Temporal Self-supervision for Video Transformers
Sukmin Yun
Jaehyung Kim
Dongyoon Han
Hwanjun Song
Jung-Woo Ha
Jinwoo Shin
ViT
15
12
0
19 Jul 2022
Multi-manifold Attention for Vision Transformers
D. Konstantinidis
Ilias Papastratis
K. Dimitropoulos
P. Daras
ViT
6
16
0
18 Jul 2022
VidConv: A modernized 2D ConvNet for Efficient Video Recognition
Chuong H. Nguyen
Su Huynh
Vinh Nguyen
Ngoc-Khanh Nguyen
ViT
16
3
0
08 Jul 2022
MVP: Robust Multi-View Practice for Driving Action Localization
Jingjie Shang
Kunchang Li
Kaibin Tian
Haisheng Su
Yangguang Li
13
3
0
05 Jul 2022
Exploring Temporally Dynamic Data Augmentation for Video Recognition
Taeoh Kim
Jinhyung Kim
Minho Shim
Sangdoo Yun
Myunggu Kang
Dongyoon Wee
Sangyoun Lee
AI4TS
15
10
0
30 Jun 2022
EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm
Jiangning Zhang
Xiangtai Li
Yabiao Wang
Chengjie Wang
Yibo Yang
Yong Liu
Dacheng Tao
ViT
28
32
0
19 Jun 2022
Surface Analysis with Vision Transformers
Simon Dahan
Logan Z. J. Williams
Abdulah Fawaz
Daniel Rueckert
E. C. Robinson
ViT
MedIm
12
2
0
31 May 2022
AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition
Shoufa Chen
Chongjian Ge
Zhan Tong
Jiangliu Wang
Yibing Song
Jue Wang
Ping Luo
141
631
0
26 May 2022
Inception Transformer
Chenyang Si
Weihao Yu
Pan Zhou
Yichen Zhou
Xinchao Wang
Shuicheng Yan
ViT
8
186
0
25 May 2022
EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers
Junting Pan
Adrian Bulat
Fuwen Tan
Xiatian Zhu
L. Dudziak
Hongsheng Li
Georgios Tzimiropoulos
Brais Martínez
ViT
10
178
0
06 May 2022
In Defense of Image Pre-Training for Spatiotemporal Recognition
Xianhang Li
Huiyu Wang
Chen Wei
Jieru Mei
Alan Yuille
Yuyin Zhou
Cihang Xie
14
0
0
03 May 2022
Unified GCNs: Towards Connecting GCNs with CNNs
Ziyan Zhang
Bo Jiang
Bin Luo
GNN
20
1
0
26 Apr 2022
ResT V2: Simpler, Faster and Stronger
Qing-Long Zhang
Yubin Yang
ViT
17
24
0
15 Apr 2022
Active Token Mixer
Guoqiang Wei
Zhizheng Zhang
Cuiling Lan
Yan Lu
Zhibo Chen
8
15
0
11 Mar 2022
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
20
101
0
16 Jan 2022
SeqFormer: Sequential Transformer for Video Instance Segmentation
Junfeng Wu
Yi-Xin Jiang
S. Bai
Wenqing Zhang
Xiang Bai
ViT
6
98
0
15 Dec 2021
DualFormer: Local-Global Stratified Transformer for Efficient Video Recognition
Yuxuan Liang
Pan Zhou
Roger Zimmermann
Shuicheng Yan
ViT
15
21
0
09 Dec 2021
Efficient Video Transformers with Spatial-Temporal Token Selection
Junke Wang
Xitong Yang
Hengduo Li
Li Liu
Zuxuan Wu
Yu-Gang Jiang
ViT
14
63
0
23 Nov 2021
VidTr: Video Transformer Without Convolutions
Yanyi Zhang
Xinyu Li
Chunhui Liu
Bing Shuai
Yi Zhu
Biagio Brattoli
Hao Chen
I. Marsic
Joseph Tighe
ViT
127
193
0
23 Apr 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
263
3,538
0
24 Feb 2021
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
220
510
0
11 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
278
1,939
0
09 Feb 2021
Video Transformer Network
Daniel Neimark
Omri Bar
Maya Zohar
Dotan Asselmann
ViT
193
419
0
01 Feb 2021
Bottleneck Transformers for Visual Recognition
A. Srinivas
Tsung-Yi Lin
Niki Parmar
Jonathon Shlens
Pieter Abbeel
Ashish Vaswani
SLR
267
955
0
27 Jan 2021
Human Action Recognition from Various Data Modalities: A Review
Zehua Sun
Qiuhong Ke
Hossein Rahmani
Mohammed Bennamoun
Gang Wang
Jun Liu
MU
30
492
0
22 Dec 2020
How Much Position Information Do Convolutional Neural Networks Encode?
Md. Amirul Islam
Sen Jia
Neil D. B. Bruce
SSL
189
343
0
22 Jan 2020
Grouped Spatial-Temporal Aggregation for Efficient Action Recognition
Chenxu Luo
Alan Yuille
113
149
0
28 Sep 2019
Previous
1
2
3