Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2211.09552
Cited By
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
17 November 2022
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
Limin Wang
Yu Qiao
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer"
17 / 17 papers shown
Title
Towards Application-Specific Evaluation of Vision Models: Case Studies in Ecology and Biology
A. H. H. Chan
Otto Brookes
Urs Waldmann
Hemal Naik
I. Couzin
...
Lukas Boesch
M. Arandjelovic
H. Kühl
T. Burghardt
Fumihiro Kano
36
0
0
05 May 2025
Beyond the Horizon: Decoupling UAVs Multi-View Action Recognition via Partial Order Transfer
Wenxuan Liu
X. Zhong
Zhuo Zhou
S. Yang
Chia-Wen Lin
Alex Chichung Kot
32
0
0
29 Apr 2025
Ctrl-GenAug: Controllable Generative Augmentation for Medical Sequence Classification
Xinrui Zhou
Yuhao Huang
Haoran Dou
Shijing Chen
Ao Chang
...
Jie Jessie Ren
Ruobing Huang
Jun Cheng
Wufeng Xue
Dong Ni
MedIm
52
0
0
25 Sep 2024
LEAP: LLM-Generation of Egocentric Action Programs
Eadom Dessalene
Michael Maynord
Cornelia Fermuller
Yiannis Aloimonos
16
3
0
29 Nov 2023
OmniVec: Learning robust representations with cross modal sharing
Siddharth Srivastava
Gaurav Sharma
SSL
16
64
0
07 Nov 2023
What Can Simple Arithmetic Operations Do for Temporal Modeling?
Wenhao Wu
Yuxin Song
Zhun Sun
Jingdong Wang
Chang Xu
Wanli Ouyang
33
8
0
18 Jul 2023
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
CLIP
23
671
0
14 Nov 2022
You Only Need 90K Parameters to Adapt Light: A Light Weight Transformer for Image Enhancement and Exposure Correction
Ziteng Cui
Kunchang Li
Lin Gu
Sheng Su
Peng Gao
Zhengkai Jiang
Yu Qiao
Tatsuya Harada
ViT
76
123
0
30 May 2022
UniFormer: Unifying Convolution and Self-attention for Visual Recognition
Kunchang Li
Yali Wang
Junhao Zhang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
142
360
0
24 Jan 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
258
7,337
0
11 Nov 2021
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Mohit Bansal
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
185
403
0
13 Jul 2021
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
283
5,723
0
29 Apr 2021
VidTr: Video Transformer Without Convolutions
Yanyi Zhang
Xinyu Li
Chunhui Liu
Bing Shuai
Yi Zhu
Biagio Brattoli
Hao Chen
I. Marsic
Joseph Tighe
ViT
127
193
0
23 Apr 2021
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
Huaishao Luo
Lei Ji
Ming Zhong
Yang Chen
Wen Lei
Nan Duan
Tianrui Li
CLIP
VLM
303
771
0
18 Apr 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
263
3,538
0
24 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
278
1,939
0
09 Feb 2021
Video Transformer Network
Daniel Neimark
Omri Bar
Maya Zohar
Dotan Asselmann
ViT
193
375
0
01 Feb 2021
1