Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.13230
Cited By
Video Swin Transformer
24 June 2021
Ze Liu
Jia Ning
Yue Cao
Yixuan Wei
Zheng-Wei Zhang
Stephen Lin
Han Hu
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Video Swin Transformer"
50 / 153 papers shown
Title
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
CLIP
31
671
0
14 Nov 2022
Grafting Vision Transformers
Jong Sung Park
Kumara Kahatapitiya
Donghyun Kim
Shivchander Sudalairaj
Quanfu Fan
Michael S. Ryoo
ViT
13
2
0
28 Oct 2022
GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online Action Prediction
Samrudhdhi B. Rangrej
Kevin J Liang
Tal Hassner
James J. Clark
20
3
0
24 Oct 2022
Transfer-learning for video classification: Video Swin Transformer on multiple domains
Daniel de Oliveira
D. Matos
ViT
22
0
0
18 Oct 2022
Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders
Haosen Yang
Deng Huang
Bin Wen
Jiannan Wu
H. Yao
Yi-Xin Jiang
Xiatian Zhu
Zehuan Yuan
21
19
0
09 Oct 2022
Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning
Weicong Liang
Yuhui Yuan
Henghui Ding
Xiao Luo
Weihong Lin
Ding Jia
Zheng-Wei Zhang
Chao Zhang
Hanhua Hu
10
25
0
03 Oct 2022
An In-depth Study of Stochastic Backpropagation
J. Fang
Ming Xu
Hao Chen
Bing Shuai
Z. Tu
Joseph Tighe
BDL
22
1
0
30 Sep 2022
DMTNet: Dynamic Multi-scale Network for Dual-pixel Images Defocus Deblurring with Transformer
Dafeng Zhang
Xiaobing Wang
ViT
11
11
0
13 Sep 2022
The Weighting Game: Evaluating Quality of Explainability Methods
Lassi Raatikainen
Esa Rahtu
FAtt
XAI
11
3
0
12 Aug 2022
Multi-Attention Network for Compressed Video Referring Object Segmentation
Weidong Chen
Dexiang Hong
Yuankai Qi
Zhenjun Han
Shuhui Wang
Laiyun Qing
Qingming Huang
Guorong Li
VOS
18
35
0
26 Jul 2022
Object State Change Classification in Egocentric Videos using the Divided Space-Time Attention Mechanism
Md. Mohaiminul Islam
Gedas Bertasius
9
7
0
24 Jul 2022
High-Resolution Swin Transformer for Automatic Medical Image Segmentation
Chen Wei
Shenghan Ren
Kaitai Guo
Haihong Hu
Jimin Liang
ViT
OOD
MedIm
4
36
0
23 Jul 2022
Video Swin Transformers for Egocentric Video Understanding @ Ego4D Challenges 2022
María Escobar
Laura Alexandra Daza
Cristina González
Jordi Pont-Tuset
Pablo Arbelaez
11
8
0
22 Jul 2022
Human-to-Robot Imitation in the Wild
Shikhar Bahl
Abhi Gupta
Deepak Pathak
9
161
0
19 Jul 2022
Clover: Towards A Unified Video-Language Alignment and Fusion Model
Jingjia Huang
Yinan Li
Jiashi Feng
Xinglong Wu
Xiaoshuai Sun
Rongrong Ji
VLM
19
46
0
16 Jul 2022
Earthformer: Exploring Space-Time Transformers for Earth System Forecasting
Zhihan Gao
Xingjian Shi
Hao Wang
Yi Zhu
Yuyang Wang
Mu Li
Dit-Yan Yeung
AI4TS
24
144
0
12 Jul 2022
Large-scale Robustness Analysis of Video Action Recognition Models
Madeline Chantry Schiappa
Naman Biyani
Prudvi Kamtam
Shruti Vyas
Hamid Palangi
Vibhav Vineet
Y. S. Rawat
AAML
16
24
0
04 Jul 2022
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
Wenhao Wu
Zhun Sun
Wanli Ouyang
VLM
87
93
0
04 Jul 2022
OmniMAE: Single Model Masked Pretraining on Images and Videos
Rohit Girdhar
Alaaeldin El-Nouby
Mannat Singh
Kalyan Vasudev Alwala
Armand Joulin
Ishan Misra
ViT
19
95
0
16 Jun 2022
MLP-3D: A MLP-like 3D Architecture with Grouped Time Mixing
Zhaofan Qiu
Ting Yao
Chong-Wah Ngo
Tao Mei
ViT
22
15
0
13 Jun 2022
Seeing the forest and the tree: Building representations of both individual and collective dynamics with transformers
Ran Liu
Mehdi Azabou
M. Dabagia
Jingyun Xiao
Eva L. Dyer
AI4CE
27
19
0
10 Jun 2022
SimVP: Simpler yet Better Video Prediction
Zhangyang Gao
Cheng Tan
Lirong Wu
Stan Z. Li
14
208
0
09 Jun 2022
BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection
Mingdong Yang
Guo Chen
Yin-Dong Zheng
Tong Lu
Limin Wang
21
45
0
05 May 2022
i-Code: An Integrative and Composable Multimodal Learning Framework
Ziyi Yang
Yuwei Fang
Chenguang Zhu
Reid Pryzant
Dongdong Chen
...
Bin Xiao
Yuanxun Lu
Takuya Yoshioka
Michael Zeng
Xuedong Huang
35
45
0
03 May 2022
NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of Compressed Video: Dataset, Methods and Results
Ren Yang
Radu Timofte
Mei Zheng
Qunliang Xing
Minglang Qiao
...
Yulin Huang
Junying Chen
I. Lee
Sunder Ali Khowaja
Jiseok Yoon
SupR
19
33
0
20 Apr 2022
Event Transformer. A sparse-aware solution for efficient event data processing
Alberto Sabater
Luis Montesano
Ana C. Murillo
19
50
0
07 Apr 2022
Joint Hand Motion and Interaction Hotspots Prediction from Egocentric Videos
Shao-Wei Liu
Subarna Tripathi
Somdeb Majumdar
Xiaolong Wang
EgoV
20
93
0
04 Apr 2022
Long Movie Clip Classification with State-Space Video Models
Md. Mohaiminul Islam
Gedas Bertasius
VLM
25
100
0
04 Apr 2022
ObjectFormer for Image Manipulation Detection and Localization
Junke Wang
Zuxuan Wu
Jingjing Chen
Xintong Han
Abhinav Shrivastava
Ser-Nam Lim
Yu-Gang Jiang
26
105
0
28 Mar 2022
Give Me Your Attention: Dot-Product Attention Considered Harmful for Adversarial Patch Robustness
Giulio Lovisotto
Nicole Finnie
Mauricio Muñoz
Chaithanya Kumar Mummadi
J. H. Metzen
AAML
ViT
17
32
0
25 Mar 2022
TransDARC: Transformer-based Driver Activity Recognition with Latent Space Feature Calibration
Kunyu Peng
Alina Roitberg
Kailun Yang
Jiaming Zhang
Rainer Stiefelhagen
ViT
27
32
0
02 Mar 2022
Temporal Perceiver: A General Architecture for Arbitrary Boundary Detection
Jing Tan
Yuhong Wang
Gangshan Wu
Limin Wang
39
14
0
01 Mar 2022
Spatio-temporal Vision Transformer for Super-resolution Microscopy
Charles N Christensen
M. Lu
Edward N. Ward
Pietro Lio'
C. Kaminski
11
8
0
28 Feb 2022
Multi-direction and Multi-scale Pyramid in Transformer for Video-based Pedestrian Retrieval
Xianghao Zang
Ge Li
Wei-Nan Gao
ViT
32
44
0
12 Feb 2022
UniFormer: Unifying Convolution and Self-attention for Visual Recognition
Kunchang Li
Yali Wang
Junhao Zhang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
142
360
0
24 Jan 2022
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
20
101
0
16 Jan 2022
UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning
Kunchang Li
Yali Wang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
17
235
0
12 Jan 2022
Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images
Ali Hatamizadeh
V. Nath
Yucheng Tang
Dong Yang
H. Roth
Daguang Xu
ViT
MedIm
15
1,015
0
04 Jan 2022
Masked Feature Prediction for Self-Supervised Visual Pre-Training
Chen Wei
Haoqi Fan
Saining Xie
Chaoxia Wu
Alan Yuille
Christoph Feichtenhofer
ViT
28
651
0
16 Dec 2021
Co-training Transformer with Videos and Images Improves Action Recognition
Bowen Zhang
Jiahui Yu
Christopher Fifty
Wei Han
Andrew M. Dai
Ruoming Pang
Fei Sha
ViT
20
54
0
14 Dec 2021
DualFormer: Local-Global Stratified Transformer for Efficient Video Recognition
Yuxuan Liang
Pan Zhou
Roger Zimmermann
Shuicheng Yan
ViT
15
21
0
09 Dec 2021
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
Yanghao Li
Chaoxia Wu
Haoqi Fan
K. Mangalam
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
25
671
0
02 Dec 2021
Self-supervised Video Transformer
Kanchana Ranasinghe
Muzammal Naseer
Salman Khan
F. Khan
Michael S. Ryoo
ViT
18
84
0
02 Dec 2021
Video-Text Pre-training with Learned Regions
Rui Yan
Mike Zheng Shou
Yixiao Ge
Alex Jinpeng Wang
Xudong Lin
Guanyu Cai
Jinhui Tang
22
23
0
02 Dec 2021
TransWeather: Transformer-based Restoration of Images Degraded by Adverse Weather Conditions
Jeya Maria Jose Valanarasu
R. Yasarla
Vishal M. Patel
ViT
32
274
0
29 Nov 2021
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling
Tsu-jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
W. Wang
Lijuan Wang
Zicheng Liu
VLM
34
215
0
24 Nov 2021
PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer
Zitong Yu
Yuming Shen
Jingang Shi
Hengshuang Zhao
Philip H. S. Torr
Guoying Zhao
ViT
MedIm
127
162
0
23 Nov 2021
Swin Transformer V2: Scaling Up Capacity and Resolution
Ze Liu
Han Hu
Yutong Lin
Zhuliang Yao
Zhenda Xie
...
Yue Cao
Zheng-Wei Zhang
Li Dong
Furu Wei
B. Guo
ViT
23
1,713
0
18 Nov 2021
Ripple Attention for Visual Perception with Sub-quadratic Complexity
Lin Zheng
Huijie Pan
Lingpeng Kong
11
3
0
06 Oct 2021
Signal Transformer: Complex-valued Attention and Meta-Learning for Signal Recognition
Yihong Dong
Ying Peng
Muqiao Yang
Songtao Lu
Qingjiang Shi
38
8
0
05 Jun 2021
Previous
1
2
3
4
Next