Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2103.15691
Cited By
v1
v2 (latest)
ViViT: A Video Vision Transformer
IEEE International Conference on Computer Vision (ICCV), 2021
29 March 2021
Anurag Arnab
Mostafa Dehghani
G. Heigold
Chen Sun
Mario Lucic
Cordelia Schmid
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (3 upvotes)
Github (3544★)
Papers citing
"ViViT: A Video Vision Transformer"
50 / 1,306 papers shown
Title
Weakly-supervised segmentation of referring expressions
Robin Strudel
Ivan Laptev
Cordelia Schmid
213
28
0
10 May 2022
BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection
Computer Vision and Image Understanding (CVIU), 2022
Mingdong Yang
Guo Chen
Yin-Dong Zheng
Tong Lu
Limin Wang
237
50
0
05 May 2022
P3IV: Probabilistic Procedure Planning from Instructional Videos with Weak Supervision
Computer Vision and Pattern Recognition (CVPR), 2022
Henghui Zhao
Isma Hadji
Nikita Dvornik
Konstantinos G. Derpanis
Richard P. Wildes
Allan D. Jepson
210
53
0
04 May 2022
CoCa: Contrastive Captioners are Image-Text Foundation Models
Jiahui Yu
Zirui Wang
Vijay Vasudevan
Legg Yeung
Mojtaba Seyedhosseini
Yonghui Wu
VLM
CLIP
OffRL
563
1,584
0
04 May 2022
i-Code: An Integrative and Composable Multimodal Learning Framework
AAAI Conference on Artificial Intelligence (AAAI), 2022
Ziyi Yang
Yuwei Fang
Chenguang Zhu
Reid Pryzant
DongDong Chen
...
Bin Xiao
Yuanxun Lu
Takuya Yoshioka
Michael Zeng
Xuedong Huang
252
53
0
03 May 2022
In Defense of Image Pre-Training for Spatiotemporal Recognition
European Conference on Computer Vision (ECCV), 2022
Xianhang Li
Huiyu Wang
Chen Wei
Jieru Mei
Alan Yuille
Yuyin Zhou
Cihang Xie
131
1
0
03 May 2022
The Wisdom of Crowds: Temporal Progressive Attention for Early Action Prediction
Computer Vision and Pattern Recognition (CVPR), 2022
Alexandros Stergiou
Dima Damen
AI4TS
EgoV
EDL
164
14
0
28 Apr 2022
Temporal Relevance Analysis for Video Action Models
Quanfu Fan
Donghyun Kim
Chun-Fu Chen
Chen
Stan Sclaroff
Kate Saenko
Sarah Adel Bargal
FAtt
153
1
0
25 Apr 2022
Transformation Invariant Cancerous Tissue Classification Using Spatially Transformed DenseNet
Omar Mahdi
Ali Bou Nassif
MedIm
74
2
0
23 Apr 2022
Progressive Training of A Two-Stage Framework for Video Restoration
Mei Zheng
Qunliang Xing
Minglang Qiao
Mai Xu
Lai Jiang
Huaida Liu
Ying-Cong Chen
217
14
0
21 Apr 2022
Disentangling Spatial-Temporal Functional Brain Networks via Twin-Transformers
Xiao-Wen Yu
Lu Zhang
Lin Zhao
Yanjun Lyu
Tianming Liu
Dajiang Zhu
86
11
0
20 Apr 2022
Less than Few: Self-Shot Video Instance Segmentation
European Conference on Computer Vision (ECCV), 2022
Pengwan Yang
Yuki M. Asano
Pascal Mettes
Cees G. M. Snoek
SSL
152
2
0
19 Apr 2022
Temporally Efficient Vision Transformer for Video Instance Segmentation
Computer Vision and Pattern Recognition (CVPR), 2022
Shusheng Yang
Xinggang Wang
Yu Li
Yuxin Fang
Jiemin Fang
Wenyu Liu
Xun Zhao
Ying Shan
ViT
168
77
0
18 Apr 2022
MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction
Yuanhao Cai
Jing Lin
Zudi Lin
Haoqian Wang
Yulun Zhang
Hanspeter Pfister
Radu Timofte
Luc Van Gool
84
260
0
17 Apr 2022
Video Diffusion Models
Neural Information Processing Systems (NeurIPS), 2022
Jonathan Ho
Tim Salimans
Alexey A. Gritsenko
William Chan
Mohammad Norouzi
David J. Fleet
DiffM
VGen
782
2,171
0
07 Apr 2022
Surface Vision Transformers: Flexible Attention-Based Modelling of Biomedical Surfaces
Simon Dahan
Hao Xu
Logan Z. J. Williams
Abdulah Fawaz
Chunhui Yang
...
A. Edwards
M. Glasser
Alistair Young
Daniel Rueckert
E. C. Robinson
ViT
MedIm
200
1
0
07 Apr 2022
Event Transformer. A sparse-aware solution for efficient event data processing
Alberto Sabater
Luis Montesano
Ana C. Murillo
205
67
0
07 Apr 2022
Multi-scale Context-aware Network with Transformer for Gait Recognition
Duo-Lin Zhu
Xiaohui Huang
Xinggang Wang
Bo Yang
Botao He
Wenyu Liu
Bin Feng
ViT
CVBM
236
16
0
07 Apr 2022
Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation
Computer Vision and Pattern Recognition (CVPR), 2022
Wangbo Zhao
Kai Wang
Xiangxiang Chu
Fuzhao Xue
Xinchao Wang
Yang You
205
30
0
06 Apr 2022
Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition
Computer Vision and Pattern Recognition (CVPR), 2022
Mingfei Han
David Junhao Zhang
Yali Wang
Rui Yan
Weitong Chen
Xiaojun Chang
Yu Qiao
157
70
0
05 Apr 2022
MaxViT: Multi-Axis Vision Transformer
European Conference on Computer Vision (ECCV), 2022
Zhengzhong Tu
Hossein Talebi
Han Zhang
Feng Yang
P. Milanfar
A. Bovik
Yinxiao Li
ViT
459
867
0
04 Apr 2022
Long Movie Clip Classification with State-Space Video Models
European Conference on Computer Vision (ECCV), 2022
Md. Mohaiminul Islam
Gedas Bertasius
VLM
395
137
0
04 Apr 2022
Vision Transformer with Cross-attention by Temporal Shift for Efficient Action Recognition
Asian Conference on Computer Vision (ACCV), 2022
Ryota Hashiguchi
Toru Tamaki
211
6
0
01 Apr 2022
Deformable Video Transformer
Computer Vision and Pattern Recognition (CVPR), 2022
Jue Wang
Lorenzo Torresani
ViT
188
31
0
31 Mar 2022
MeMOT: Multi-Object Tracking with Memory
Computer Vision and Pattern Recognition (CVPR), 2022
Jiarui Cai
Mingze Xu
Wei Li
Yuanjun Xiong
Wei Xia
Zhuowen Tu
Stefano Soatto
VOT
246
210
0
31 Mar 2022
Stochastic Backpropagation: A Memory Efficient Strategy for Training Video Models
Computer Vision and Pattern Recognition (CVPR), 2022
Feng Cheng
Ming Xu
Yuanjun Xiong
Hao Chen
Xinyu Li
Wei Li
Wei Xia
111
18
0
31 Mar 2022
TubeDETR: Spatio-Temporal Video Grounding with Transformers
Computer Vision and Pattern Recognition (CVPR), 2022
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
ViT
280
120
0
30 Mar 2022
VPTR: Efficient Transformers for Video Prediction
International Conference on Pattern Recognition (ICPR), 2022
Xi Ye
Guillaume-Alexandre Bilodeau
ViT
206
26
0
29 Mar 2022
End-to-End Compressed Video Representation Learning for Generic Event Boundary Detection
Computer Vision and Pattern Recognition (CVPR), 2022
Congcong Li
Xinyao Wang
Longyin Wen
Dexiang Hong
Tiejian Luo
Libo Zhang
128
20
0
29 Mar 2022
Exploring Intra- and Inter-Video Relation for Surgical Semantic Scene Segmentation
IEEE Transactions on Medical Imaging (IEEE TMI), 2022
Yueming Jin
Yang Yu
Cheng Chen
Zixu Zhao
Pheng-Ann Heng
Danail Stoyanov
179
47
0
29 Mar 2022
Frame-wise Action Representations for Long Videos via Sequence Contrastive Learning
Computer Vision and Pattern Recognition (CVPR), 2022
Minghao Chen
Fangyun Wei
Chong Li
Deng Cai
AI4TS
212
44
0
28 Mar 2022
Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos
Computer Vision and Pattern Recognition (CVPR), 2022
Muheng Li
Lei Chen
Yueqi Duan
Zhilan Hu
Jianjiang Feng
Jie Zhou
Jiwen Lu
152
91
0
26 Mar 2022
Give Me Your Attention: Dot-Product Attention Considered Harmful for Adversarial Patch Robustness
Computer Vision and Pattern Recognition (CVPR), 2022
Giulio Lovisotto
Nicole Finnie
Mauricio Muñoz
Chaithanya Kumar Mummadi
J. H. Metzen
AAML
ViT
118
47
0
25 Mar 2022
RayTran: 3D pose estimation and shape reconstruction of multiple objects from videos with ray-traced transformers
European Conference on Computer Vision (ECCV), 2022
M. Tyszkiewicz
Kevis-Kokitsi Maninis
S. Popov
V. Ferrari
ViT
254
21
0
24 Mar 2022
Self-supervised Video-centralised Transformer for Video Face Clustering
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Yujiang Wang
Mingzhi Dong
Jie Shen
Yi-Si Luo
Yiming Lin
Pingchuan Ma
Stavros Petridis
Maja Pantic
ViT
298
4
0
24 Mar 2022
Transformers Meet Visual Learning Understanding: A Comprehensive Review
Yuting Yang
Licheng Jiao
Xuantong Liu
Fan Liu
Shuyuan Yang
Zhixi Feng
Xu Tang
ViT
MedIm
202
34
0
24 Mar 2022
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Neural Information Processing Systems (NeurIPS), 2022
Zhan Tong
Yibing Song
Jue Wang
Limin Wang
ViT
656
1,591
0
23 Mar 2022
Deep Frequency Filtering for Domain Generalization
Computer Vision and Pattern Recognition (CVPR), 2022
Shiqi Lin
Zhizheng Zhang
Zhipeng Huang
Yan Lu
Cuiling Lan
...
Jiang Wang
Zicheng Liu
Amey Parulkar
V. Navkal
Zhibo Chen
230
67
0
23 Mar 2022
Contrastive Transformer-based Multiple Instance Learning for Weakly Supervised Polyp Frame Detection
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2022
Yu Tian
Guansong Pang
Fengbei Liu
Yuyuan Liu
Chong Wang
Yuanhong Chen
Johan Verjans
G. Carneiro
ViT
MedIm
222
34
0
23 Mar 2022
Scalable Video Object Segmentation with Identification Mechanism
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Zongxin Yang
Jiaxu Miao
Yunchao Wei
Wenguan Wang
Xiaohan Wang
Yi Yang
VOS
388
36
0
22 Mar 2022
FAR: Fourier Aerial Video Recognition
European Conference on Computer Vision (ECCV), 2022
D. Kothandaraman
Tianrui Guan
Xijun Wang
Sean Hu
Ming-Shun Lin
Tianyi Zhou
217
19
0
21 Mar 2022
DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition
Computer Vision and Pattern Recognition (CVPR), 2022
Thanh-Dat Truong
Quoc-Huy Bui
C. Duong
Han-Seok Seo
Son Lam Phung
Xin Li
Khoa Luu
ViT
211
69
0
19 Mar 2022
Three things everyone should know about Vision Transformers
European Conference on Computer Vision (ECCV), 2022
Hugo Touvron
Matthieu Cord
Alaaeldin El-Nouby
Jakob Verbeek
Edouard Grave
ViT
223
150
0
18 Mar 2022
Group Contextualization for Video Recognition
Computer Vision and Pattern Recognition (CVPR), 2022
Y. Hao
Haotong Zhang
Chong-Wah Ngo
Xiangnan He
116
34
0
18 Mar 2022
Look Outside the Room: Synthesizing A Consistent Long-Term 3D Scene Video from A Single Image
Computer Vision and Pattern Recognition (CVPR), 2022
Xuanchi Ren
Xiaolong Wang
VGen
186
65
0
17 Mar 2022
BrainGB: A Benchmark for Brain Network Analysis with Graph Neural Networks
IEEE Transactions on Medical Imaging (IEEE TMI), 2022
Hejie Cui
Wei Dai
Yanqiao Zhu
Xuan Kan
Antonio Aodong Chen Gu
Joshua Lukemire
Chen Tang
Lifang He
Ying Guo
Carl Yang
226
165
0
17 Mar 2022
Patch-Fool: Are Vision Transformers Always Robust Against Adversarial Perturbations?
International Conference on Learning Representations (ICLR), 2022
Y. Fu
Shunyao Zhang
Shan-Hung Wu
Cheng Wan
Yingyan Lin
AAML
348
82
0
16 Mar 2022
The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy
Computer Vision and Pattern Recognition (CVPR), 2022
Tianlong Chen
Zhenyu Zhang
Yu Cheng
Ahmed Hassan Awadallah
Zinan Lin
ViT
222
49
0
12 Mar 2022
Anti-Oversmoothing in Deep Vision Transformers via the Fourier Domain Analysis: From Theory to Practice
International Conference on Learning Representations (ICLR), 2022
Peihao Wang
Wenqing Zheng
Tianlong Chen
Zinan Lin
ViT
227
189
0
09 Mar 2022
AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant
European Conference on Computer Vision (ECCV), 2022
B. Wong
Joya Chen
You Wu
Stan Weixian Lei
Dongxing Mao
Difei Gao
Mike Zheng Shou
EgoV
384
33
0
08 Mar 2022
Previous
1
2
3
...
22
23
24
25
26
27
Next