Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2205.14949
Cited By
HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling
30 May 2022
Xiaosong Zhang
Yunjie Tian
Wei Huang
QiXiang Ye
Qi Dai
Lingxi Xie
Qi Tian
Re-assign community
ArXiv
PDF
HTML
Papers citing
"HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling"
26 / 26 papers shown
Title
Image Recognition with Online Lightweight Vision Transformer: A Survey
Zherui Zhang
Rongtao Xu
Jie Zhou
Changwei Wang
Xingtian Pei
...
Jiguang Zhang
Li Guo
Longxiang Gao
W. Xu
Shibiao Xu
ViT
36
0
0
06 May 2025
Structured-Noise Masked Modeling for Video, Audio and Beyond
Aritra Bhowmik
Fida Mohammad Thoker
Carlos Hinojosa
Bernard Ghanem
Cees G. M. Snoek
VGen
54
0
0
20 Mar 2025
Personalized Large Vision-Language Models
Chau Pham
Hoang Phan
David Doermann
Yunjie Tian
VLM
41
3
0
23 Dec 2024
GG-SSMs: Graph-Generating State Space Models
Nikola Zubić
Davide Scaramuzza
Mamba
74
1
0
17 Dec 2024
Mamba YOLO: SSMs-Based YOLO For Object Detection
Zeyu Wang
Chen Li
Huiying Xu
Xinzhong Zhu
Mamba
34
13
0
09 Jun 2024
LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection
Qiang Chen
Xiangbo Su
Xinyu Zhang
Jian Wang
Jiahui Chen
...
Shan Zhang
Kun Yao
Errui Ding
Gang Zhang
Jingdong Wang
ViT
31
5
0
05 Jun 2024
ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention
Bencheng Liao
Xinggang Wang
Lianghui Zhu
Qian Zhang
Chang Huang
37
3
0
28 May 2024
Masked Modeling for Self-supervised Representation Learning on Vision and Beyond
Siyuan Li
Luyuan Zhang
Zedong Wang
Di Wu
Lirong Wu
...
Jun-Xiong Xia
Cheng Tan
Yang Liu
Baigui Sun
Stan Z. Li
SSL
21
13
0
31 Dec 2023
Hierarchical Side-Tuning for Vision Transformers
Weifeng Lin
Ziheng Wu
Wentao Yang
Mingxin Huang
Jun Huang
Lianwen Jin
13
3
0
09 Oct 2023
Spatial Transform Decoupling for Oriented Object Detection
Hongtian Yu
Yunjie Tian
QiXiang Ye
Yunfan Liu
16
26
0
21 Aug 2023
Self-Calibrated Cross Attention Network for Few-Shot Segmentation
Qianxiong Xu
Wenting Zhao
Guosheng Lin
Cheng Long
12
13
0
18 Aug 2023
Diffusion Models as Masked Autoencoders
Chen Wei
K. Mangalam
Po-Yao (Bernie) Huang
Yanghao Li
Haoqi Fan
Hu Xu
Huiyu Wang
Cihang Xie
Alan Yuille
Christoph Feichtenhofer
DiffM
SyDa
18
47
0
06 Apr 2023
CAE v2: Context Autoencoder with CLIP Target
Xinyu Zhang
Jiahui Chen
Junkun Yuan
Qiang Chen
Jian Wang
...
Jimin Pi
Kun Yao
Junyu Han
Errui Ding
Jingdong Wang
VLM
CLIP
24
24
0
17 Nov 2022
Rethinking Hierarchies in Pre-trained Plain Vision Transformer
Yufei Xu
Jing Zhang
Qiming Zhang
Dacheng Tao
9
1
0
03 Nov 2022
SimpleClick: Interactive Image Segmentation with Simple Vision Transformers
Qin Liu
Zhenlin Xu
Gedas Bertasius
Marc Niethammer
16
79
0
20 Oct 2022
A Unified View of Masked Image Modeling
Zhiliang Peng
Li Dong
Hangbo Bao
QiXiang Ye
Furu Wei
VLM
44
35
0
19 Oct 2022
A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond
Chaoning Zhang
Chenshuang Zhang
Junha Song
John Seon Keun Yi
Kang Zhang
In So Kweon
SSL
26
70
0
30 Jul 2022
Context Autoencoder for Self-Supervised Representation Learning
Xiaokang Chen
Mingyu Ding
Xiaodi Wang
Ying Xin
Shentong Mo
Yunhao Wang
Shumin Han
Ping Luo
Gang Zeng
Jingdong Wang
SSL
11
384
0
07 Feb 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
255
7,337
0
11 Nov 2021
UFO-ViT: High Performance Linear Vision Transformer without Softmax
Jeonggeun Song
ViT
104
16
0
29 Sep 2021
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
239
2,554
0
04 May 2021
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
283
5,723
0
29 Apr 2021
Transformer in Transformer
Kai Han
An Xiao
Enhua Wu
Jianyuan Guo
Chunjing Xu
Yunhe Wang
ViT
276
1,490
0
27 Feb 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
263
3,538
0
24 Feb 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,735
0
24 Feb 2021
Bottleneck Transformers for Visual Recognition
A. Srinivas
Tsung-Yi Lin
Niki Parmar
Jonathon Shlens
Pieter Abbeel
Ashish Vaswani
SLR
262
955
0
27 Jan 2021
1