Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.02964
Cited By
Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection
6 April 2022
Yuxin Fang
Shusheng Yang
Shijie Wang
Yixiao Ge
Ying Shan
Xinggang Wang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection"
37 / 37 papers shown
Title
Simpler Fast Vision Transformers with a Jumbo CLS Token
A. Fuller
Yousef Yassin
Daniel G. Kyrollos
Evan Shelhamer
James R. Green
72
0
0
24 Feb 2025
Self-Supervised Learning for Real-World Object Detection: a Survey
Alina Ciocarlan
Sidonie Lefebvre
S. L. Hégarat-Mascle
Arnaud Woiselle
ObjD
36
0
0
09 Oct 2024
A Review of Transformer-Based Models for Computer Vision Tasks: Capturing Global Context and Spatial Relationships
Gracile Astlin Pereira
Muhammad Hussain
ViT
37
7
0
27 Aug 2024
VFM-Det: Towards High-Performance Vehicle Detection via Large Foundation Models
Wentao Wu
Fanghua Hong
Xiao Wang
Chenglong Li
Jin Tang
VLM
59
1
0
23 Aug 2024
Masked Image Modeling: A Survey
Vlad Hondru
Florinel-Alin Croitoru
Shervin Minaee
Radu Tudor Ionescu
N. Sebe
69
6
0
13 Aug 2024
Robust Adaptation of Foundation Models with Black-Box Visual Prompting
Changdae Oh
Gyeongdeok Seo
Geunyoung Jung
Zhi-Qi Cheng
Hosik Choi
Jiyoung Jung
Kyungwoo Song
VLM
38
1
0
04 Jul 2024
Adapting Pretrained ViTs with Convolution Injector for Visuo-Motor Control
Dongyoon Hwang
ByungKun Lee
Hojoon Lee
Hyunseung Kim
Jaegul Choo
53
0
0
10 Jun 2024
ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention
Bencheng Liao
Xinggang Wang
Lianghui Zhu
Qian Zhang
Chang Huang
54
4
0
28 May 2024
Playing to Vision Foundation Model's Strengths in Stereo Matching
Chuangwei Liu
Qijun Chen
Rui Fan
40
12
0
09 Apr 2024
Pneumonia App: a mobile application for efficient pediatric pneumonia diagnosis using explainable convolutional neural networks (CNN)
Jiaming Deng
Zhenglin Chen
Minjiang Chen
Lulu Xu
Jiaqi Yang
Zhendong Luo
Peiwu Qin
54
2
0
31 Mar 2024
Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression
Hancheng Ye
Chong Yu
Peng Ye
Renqiu Xia
Yansong Tang
Jiwen Lu
Tao Chen
Bo-Wen Zhang
53
3
0
23 Mar 2024
HCPM: Hierarchical Candidates Pruning for Efficient Detector-Free Matching
Ying Chen
Yong-Jin Liu
Kai Wu
Qiang Nie
Shang Xu
Huifang Ma
Bing Wang
Chengjie Wang
VLM
40
1
0
19 Mar 2024
Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively
Haobo Yuan
Xiangtai Li
Chong Zhou
Yining Li
Kai Chen
Chen Change Loy
VLM
29
51
0
05 Jan 2024
Masked Modeling for Self-supervised Representation Learning on Vision and Beyond
Siyuan Li
Luyuan Zhang
Zedong Wang
Di Wu
Lirong Wu
...
Jun-Xiong Xia
Cheng Tan
Yang Liu
Baigui Sun
Stan Z. Li
SSL
36
14
0
31 Dec 2023
Win-Win: Training High-Resolution Vision Transformers from Two Windows
Vincent Leroy
Jérôme Revaud
Thomas Lucas
Philippe Weinzaepfel
ViT
36
2
0
01 Oct 2023
Spatial Transform Decoupling for Oriented Object Detection
Hongtian Yu
Yunjie Tian
QiXiang Ye
Yunfan Liu
37
26
0
21 Aug 2023
DETR Doesn't Need Multi-Scale or Locality Design
Yutong Lin
Yuhui Yuan
Zheng-Wei Zhang
Chen Li
Nanning Zheng
Han Hu
37
5
0
03 Aug 2023
Stitched ViTs are Flexible Vision Backbones
Zizheng Pan
Jing Liu
Haoyu He
Jianfei Cai
Bohan Zhuang
20
2
0
30 Jun 2023
ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers
J. Yao
Xinggang Wang
Shusheng Yang
Baoyuan Wang
ViT
35
57
0
24 May 2023
BlackVIP: Black-Box Visual Prompting for Robust Transfer Learning
Changdae Oh
Hyeji Hwang
Hee-young Lee
Yongtaek Lim
Geunyoung Jung
Jiyoung Jung
Hosik Choi
Kyungwoo Song
VLM
VPVLM
85
57
0
26 Mar 2023
EVA-02: A Visual Representation for Neon Genesis
Yuxin Fang
Quan-Sen Sun
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
ViT
CLIP
40
259
0
20 Mar 2023
RILS: Masked Visual Reconstruction in Language Semantic Space
Shusheng Yang
Yixiao Ge
Kun Yi
Dian Li
Ying Shan
Xiaohu Qie
Xinggang Wang
CLIP
43
11
0
17 Jan 2023
AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning with Masked Autoencoders
W. G. C. Bandara
Naman Patel
A. Gholami
Mehdi Nikkhah
M. Agrawal
Vishal M. Patel
25
39
0
16 Nov 2022
RegCLR: A Self-Supervised Framework for Tabular Representation Learning in the Wild
Weiyao Wang
Byung-Hak Kim
Varun Ganapathi
SSL
LMTD
27
1
0
02 Nov 2022
Can We Solve 3D Vision Tasks Starting from A 2D Vision Transformer?
Yi Wang
Zhiwen Fan
Tianlong Chen
Hehe Fan
Zhangyang Wang
ViT
53
9
0
15 Sep 2022
Robust Multi-Object Tracking by Marginal Inference
Yifu Zhang
Chunyu Wang
Xinggang Wang
Wenjun Zeng
Wenyu Liu
VOT
26
17
0
07 Aug 2022
A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond
Chaoning Zhang
Chenshuang Zhang
Junha Song
John Seon Keun Yi
Kang Zhang
In So Kweon
SSL
57
71
0
30 Jul 2022
Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection
Feng Liu
Xiaosong Zhang
Zhiliang Peng
Zonghao Guo
Fang Wan
Xian-Wei Ji
QiXiang Ye
ObjD
43
20
0
19 May 2022
Vision Transformer Adapter for Dense Predictions
Zhe Chen
Yuchen Duan
Wenhai Wang
Junjun He
Tong Lu
Jifeng Dai
Yu Qiao
43
542
0
17 May 2022
ConvMAE: Masked Convolution Meets Masked Autoencoders
Peng Gao
Teli Ma
Hongsheng Li
Ziyi Lin
Jifeng Dai
Yu Qiao
ViT
19
121
0
08 May 2022
mc-BEiT: Multi-choice Discretization for Image BERT Pre-training
Xiaotong Li
Yixiao Ge
Kun Yi
Zixuan Hu
Ying Shan
Ling-yu Duan
37
38
0
29 Mar 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
305
7,443
0
11 Nov 2021
ByteTrack: Multi-Object Tracking by Associating Every Detection Box
Yifu Zhang
Pei Sun
Yi-Xin Jiang
Dongdong Yu
Fucheng Weng
Zehuan Yuan
Ping Luo
Wenyu Liu
Xinggang Wang
VOT
107
1,330
0
13 Oct 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
277
3,623
0
24 Feb 2021
Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation
Golnaz Ghiasi
Huayu Chen
A. Srinivas
Rui Qian
Nayeon Lee
E. D. Cubuk
Quoc V. Le
Barret Zoph
ISeg
252
968
0
13 Dec 2020
Aggregated Residual Transformations for Deep Neural Networks
Saining Xie
Ross B. Girshick
Piotr Dollár
Z. Tu
Kaiming He
297
10,220
0
16 Nov 2016
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
296
39,198
0
01 Sep 2014
1