Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2201.09450
Cited By
UniFormer: Unifying Convolution and Self-attention for Visual Recognition
24 January 2022
Kunchang Li
Yali Wang
Junhao Zhang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"UniFormer: Unifying Convolution and Self-attention for Visual Recognition"
18 / 18 papers shown
Title
Learning Streaming Video Representation via Multitask Training
Yibin Yan
Jilan Xu
Shangzhe Di
Yikun Liu
Yudi Shi
Qirui Chen
Zeqian Li
Yifei Huang
Weidi Xie
CLL
62
123
0
28 Apr 2025
OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels
Meng Lou
Yizhou Yu
85
1
0
27 Feb 2025
Accessible, At-Home Detection of Parkinson's Disease via Multi-task Video Analysis
Md. Saiful Islam
Tariq Adnan
Jan Freyberg
Sangwu Lee
Abdelrahman Abdelkader
...
Cathe Schwartz
Karen Jaffe
Ruth B. Schneider
E. R. Dorsey
Ehsan Hoque
43
0
0
21 Jun 2024
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer
Sachin Mehta
Mohammad Rastegari
ViT
152
1,027
0
05 Oct 2021
Mobile-Former: Bridging MobileNet and Transformer
Yinpeng Chen
Xiyang Dai
Dongdong Chen
Mengchen Liu
Xiaoyi Dong
Lu Yuan
Zicheng Liu
ViT
139
356
0
12 Aug 2021
VidTr: Video Transformer Without Convolutions
Yanyi Zhang
Xinyu Li
Chunhui Liu
Bing Shuai
Yi Zhu
Biagio Brattoli
Hao Chen
I. Marsic
Joseph Tighe
ViT
87
178
0
23 Apr 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
247
2,898
0
24 Feb 2021
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
199
450
0
11 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
261
1,486
0
09 Feb 2021
TransReID: Transformer-based Object Re-Identification
Shuting He
Haowen Luo
Pichao Wang
F. Wang
Hao Li
Wei Jiang
ViT
175
600
0
08 Feb 2021
Video Transformer Network
Daniel Neimark
Omri Bar
Maya Zohar
Dotan Asselmann
ViT
178
375
0
01 Feb 2021
Bottleneck Transformers for Visual Recognition
A. Srinivas
Tsung-Yi Lin
Niki Parmar
Jonathon Shlens
Pieter Abbeel
Ashish Vaswani
SLR
232
840
0
27 Jan 2021
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
381
532
0
21 Jul 2020
Grouped Spatial-Temporal Aggregation for Efficient Action Recognition
Chenxu Luo
Alan Yuille
94
142
0
28 Sep 2019
Deep High-Resolution Representation Learning for Visual Recognition
Jingdong Wang
Ke Sun
Tianheng Cheng
Borui Jiang
Chaorui Deng
...
Yadong Mu
Mingkui Tan
Xinggang Wang
Wenyu Liu
Bin Xiao
146
1,866
0
20 Aug 2019
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Andrew G. Howard
Menglong Zhu
Bo Chen
Dmitry Kalenichenko
Weijun Wang
Tobias Weyand
M. Andreetto
Hartwig Adam
3DH
924
18,450
0
17 Apr 2017
Aggregated Residual Transformations for Deep Neural Networks
Saining Xie
Ross B. Girshick
Piotr Dollár
Z. Tu
Kaiming He
239
6,278
0
16 Nov 2016
Semantic Understanding of Scenes through the ADE20K Dataset
Bolei Zhou
Hang Zhao
Xavier Puig
Tete Xiao
Sanja Fidler
Adela Barriuso
Antonio Torralba
SSeg
216
1,501
0
18 Aug 2016
1