Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.07118
Cited By
DeiT III: Revenge of the ViT
14 April 2022
Hugo Touvron
Matthieu Cord
Hervé Jégou
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DeiT III: Revenge of the ViT"
50 / 72 papers shown
Title
Improving the Reproducibility of Deep Learning Software: An Initial Investigation through a Case Study Analysis
Nikita Ravi
Abhinav Goel
James C. Davis
George K. Thiruvathukal
35
0
0
06 May 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjD
VOS
103
0
0
17 Apr 2025
USP: Unified Self-Supervised Pretraining for Image Generation and Understanding
Xiangxiang Chu
Renda Li
Yong Wang
60
0
0
08 Mar 2025
Solving Instance Detection from an Open-World Perspective
Qianqian Shen
Yunhan Zhao
Nahyun Kwon
Jeeeun Kim
Yanan Li
Shu Kong
32
0
0
01 Mar 2025
Data-free Universal Adversarial Perturbation with Pseudo-semantic Prior
Chanhui Lee
Yeonghwan Song
Jeany Son
AAML
85
0
0
28 Feb 2025
Vision-LSTM: xLSTM as Generic Vision Backbone
Benedikt Alkin
M. Beck
Korbinian Poppel
Sepp Hochreiter
Johannes Brandstetter
VLM
56
39
0
24 Feb 2025
Disentangling Visual Transformers: Patch-level Interpretability for Image Classification
Guillaume Jeanneret
Loïc Simon
F. Jurie
ViT
44
0
0
24 Feb 2025
UNIP: Rethinking Pre-trained Attention Patterns for Infrared Semantic Segmentation
Tao Zhang
Jinyong Wen
Zhen Chen
Kun Ding
S. Xiang
Chunhong Pan
72
1
0
04 Feb 2025
Navigating the Maze of Explainable AI: A Systematic Approach to Evaluating Methods and Metrics
Lukas Klein
Carsten T. Lüth
U. Schlegel
Till J. Bungert
Mennatallah El-Assady
Paul F. Jäger
XAI
ELM
34
1
0
03 Jan 2025
EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality
Sanghyeok Lee
Joonmyung Choi
Hyunwoo J. Kim
110
3
0
22 Nov 2024
A Survey of the Self Supervised Learning Mechanisms for Vision Transformers
Asifullah Khan
A. Sohail
M. Fiaz
Mehdi Hassan
Tariq Habib Afridi
...
Muhammad Zaigham Zaheer
Kamran Ali
Tangina Sultana
Ziaurrehman Tanoli
Naeem Akhter
41
3
0
30 Aug 2024
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Ali Hatamizadeh
Jan Kautz
Mamba
33
56
0
10 Jul 2024
Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation
Yuchen Yang
Yingdong Shi
Cheems Wang
Xiantong Zhen
Yuxuan Shi
Jun Xu
32
1
0
24 Jun 2024
Scaling White-Box Transformers for Vision
Jinrui Yang
Xianhang Li
Druv Pai
Yuyin Zhou
Yi-An Ma
Yaodong Yu
Cihang Xie
ViT
34
9
0
30 May 2024
Wavelet-Based Image Tokenizer for Vision Transformers
Zhenhai Zhu
Radu Soricut
ViT
27
3
0
28 May 2024
Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation
Ya Lu
Jishnu Jaykumar
Yunhui Guo
Nicholas Ruozzi
Yu Xiang
VLM
ISeg
48
4
0
28 May 2024
Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks
Jerome Sieber
Carmen Amo Alonso
A. Didier
M. Zeilinger
Antonio Orvieto
AAML
42
8
0
24 May 2024
LookHere: Vision Transformers with Directed Attention Generalize and Extrapolate
A. Fuller
Daniel G. Kyrollos
Yousef Yassin
James R. Green
34
2
0
22 May 2024
Transformer-Based Classification Outcome Prediction for Multimodal Stroke Treatment
Danqing Ma
Meng Wang
Ao Xiang
Zongqing Qi
Qin Yang
27
18
0
19 Apr 2024
HIRI-ViT: Scaling Vision Transformer with High Resolution Inputs
Ting Yao
Yehao Li
Yingwei Pan
Tao Mei
ViT
23
15
0
18 Mar 2024
Setting the Record Straight on Transformer Oversmoothing
G. Dovonon
M. Bronstein
Matt J. Kusner
20
5
0
09 Jan 2024
Morphing Tokens Draw Strong Masked Image Models
Taekyung Kim
Byeongho Heo
Dongyoon Han
44
3
0
30 Dec 2023
4M: Massively Multimodal Masked Modeling
David Mizrahi
Roman Bachmann
Ouguzhan Fatih Kar
Teresa Yeo
Mingfei Gao
Afshin Dehghan
Amir Zamir
MLLM
34
62
0
11 Dec 2023
Diversify, Don't Fine-Tune: Scaling Up Visual Recognition Training with Synthetic Images
Zhuoran Yu
Chenchen Zhu
Sean Culatana
Raghuraman Krishnamoorthi
Fanyi Xiao
Yong Jae Lee
109
13
0
04 Dec 2023
SCHEME: Scalable Channel Mixer for Vision Transformers
Deepak Sridhar
Yunsheng Li
Nuno Vasconcelos
18
0
0
01 Dec 2023
A Lightweight Clustering Framework for Unsupervised Semantic Segmentation
Yau Shing Jonathan Cheung
Xi Chen
Lihe Yang
Hengshuang Zhao
12
1
0
30 Nov 2023
No Representation Rules Them All in Category Discovery
S. Vaze
Andrea Vedaldi
Andrew Zisserman
OOD
24
31
0
28 Nov 2023
Minimalist and High-Performance Semantic Segmentation with Plain Vision Transformers
Yuanduo Hong
Jue Wang
Weichao Sun
Huihui Pan
VLM
ViT
27
7
0
19 Oct 2023
Training a Large Video Model on a Single Machine in a Day
Yue Zhao
Philipp Krahenbuhl
VLM
25
15
0
28 Sep 2023
Vision Transformers Need Registers
Zilong Chen
Maxime Oquab
Julien Mairal
Huaping Liu
ViT
35
311
0
28 Sep 2023
PanoSwin: a Pano-style Swin Transformer for Panorama Understanding
Zhixin Ling
Zhen Xing
Xiangdong Zhou
Manliang Cao
G. Zhou
ViT
16
17
0
28 Aug 2023
TurboViT: Generating Fast Vision Transformers via Generative Architecture Search
Alexander Wong
Saad Abbasi
Saeejith Nair
ViT
15
1
0
22 Aug 2023
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers
Tobias Christian Nauen
Sebastián M. Palacio
Federico Raue
Andreas Dengel
35
3
0
18 Aug 2023
Distributionally Robust Classification on a Data Budget
Ben Feuer
Ameya Joshi
Minh Pham
C. Hegde
OOD
22
2
0
07 Aug 2023
Get the Best of Both Worlds: Improving Accuracy and Transferability by Grassmann Class Representation
Haoqi Wang
Zhizhong Li
Wayne Zhang
10
1
0
03 Aug 2023
MiDaS v3.1 -- A Model Zoo for Robust Monocular Relative Depth Estimation
R. Birkl
Diana Wofk
Matthias Muller
MDE
23
133
0
26 Jul 2023
Image Captions are Natural Prompts for Text-to-Image Models
Shiye Lei
Hao Chen
Senyang Zhang
Bo-Lu Zhao
Dacheng Tao
VLM
19
19
0
17 Jul 2023
Empirically Validating Conformal Prediction on Modern Vision Architectures Under Distribution Shift and Long-tailed Data
Kevin Kasa
Graham W. Taylor
22
2
0
03 Jul 2023
Permutation Equivariance of Transformers and Its Applications
Hengyuan Xu
Liyao Xiang
Hang Ye
Dixi Yao
Pengzhi Chu
Baochun Li
17
13
0
16 Apr 2023
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
...
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
VLM
CLIP
SSL
41
3,011
0
14 Apr 2023
Data-Efficient Image Quality Assessment with Attention-Panel Decoder
Guanyi Qin
R. Hu
Yutao Liu
Xiawu Zheng
Haotian Liu
Xiu Li
Yan Zhang
ViT
17
59
0
11 Apr 2023
PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling
Yuan Liu
Songyang Zhang
Jiacheng Chen
Kai-xiang Chen
Dahua Lin
67
27
0
04 Mar 2023
Layer Grafted Pre-training: Bridging Contrastive Learning And Masked Image Modeling For Label-Efficient Representations
Ziyu Jiang
Yinpeng Chen
Mengchen Liu
Dongdong Chen
Xiyang Dai
Lu Yuan
Zicheng Liu
Zhangyang Wang
SSL
VLM
CLIP
22
16
0
27 Feb 2023
Scaling Vision Transformers to 22 Billion Parameters
Mostafa Dehghani
Josip Djolonga
Basil Mustafa
Piotr Padlewski
Jonathan Heek
...
Mario Luvcić
Xiaohua Zhai
Daniel Keysers
Jeremiah Harmsen
N. Houlsby
MLLM
56
569
0
10 Feb 2023
Out of Distribution Performance of State of Art Vision Model
Salman Rahman
W. Lee
18
2
0
25 Jan 2023
Representation Separation for Semantic Segmentation with Vision Transformers
Yuanduo Hong
Huihui Pan
Weichao Sun
Xinghu Yu
Huijun Gao
ViT
16
5
0
28 Dec 2022
Rethinking Vision Transformers for MobileNet Size and Speed
Yanyu Li
Ju Hu
Yang Wen
Georgios Evangelidis
Kamyar Salahi
Yanzhi Wang
Sergey Tulyakov
Jian Ren
ViT
21
157
0
15 Dec 2022
Co-training
2
L
2^L
2
L
Submodels for Visual Recognition
Hugo Touvron
Matthieu Cord
Maxime Oquab
Piotr Bojanowski
Jakob Verbeek
Hervé Jégou
VLM
22
9
0
09 Dec 2022
Deep Incubation: Training Large Models by Divide-and-Conquering
Zanlin Ni
Yulin Wang
Jiangwei Yu
Haojun Jiang
Yu Cao
Gao Huang
VLM
13
11
0
08 Dec 2022
Learning Imbalanced Data with Vision Transformers
Zhengzhuo Xu
R. Liu
Shuo Yang
Zenghao Chai
Chun Yuan
19
31
0
05 Dec 2022
1
2
Next