Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.09795
Cited By
Three things everyone should know about Vision Transformers
18 March 2022
Hugo Touvron
Matthieu Cord
Alaaeldin El-Nouby
Jakob Verbeek
Hervé Jégou
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Three things everyone should know about Vision Transformers"
26 / 26 papers shown
Title
Vision-LSTM: xLSTM as Generic Vision Backbone
Benedikt Alkin
M. Beck
Korbinian Poppel
Sepp Hochreiter
Johannes Brandstetter
VLM
56
39
0
24 Feb 2025
A Federated Learning-Friendly Approach for Parameter-Efficient Fine-Tuning of SAM in 3D Segmentation
Mothilal Asokan
Difei Gao
Joya Chen
Mike Zheng Shou
FedML
MedIm
36
1
0
31 Jul 2024
LookHere: Vision Transformers with Directed Attention Generalize and Extrapolate
A. Fuller
Daniel G. Kyrollos
Yousef Yassin
James R. Green
34
2
0
22 May 2024
Low-Rank Adaptation of Time Series Foundational Models for Out-of-Domain Modality Forecasting
Divij Gupta
Anubhav Bhatti
Surajsinh Parmar
Chen Dan
Yuwei Liu
Bingjie Shen
San Lee
AI4TS
30
2
0
16 May 2024
How to build the best medical image segmentation algorithm using foundation models: a comprehensive empirical study with Segment Anything Model
Han Gu
Haoyu Dong
Jichen Yang
Maciej Mazurowski
MedIm
VLM
75
11
0
15 Apr 2024
OnDev-LCT: On-Device Lightweight Convolutional Transformers towards federated learning
Chu Myaet Thwal
Minh N. H. Nguyen
Ye Lin Tun
Seongjin Kim
My T. Thai
Choong Seon Hong
49
5
0
22 Jan 2024
SCHEME: Scalable Channel Mixer for Vision Transformers
Deepak Sridhar
Yunsheng Li
Nuno Vasconcelos
18
0
0
01 Dec 2023
No Representation Rules Them All in Category Discovery
S. Vaze
Andrea Vedaldi
Andrew Zisserman
OOD
24
31
0
28 Nov 2023
3D Transformer based on deformable patch location for differential diagnosis between Alzheimer's disease and Frontotemporal dementia
H. Nguyen
Michael Clement
Boris Mansencal
Pierrick Coupé
MedIm
21
0
0
06 Sep 2023
A Parameter-efficient Multi-subject Model for Predicting fMRI Activity
Connor Lane
Gregory Kiar
19
2
0
04 Aug 2023
A Novel Site-Agnostic Multimodal Deep Learning Model to Identify Pro-Eating Disorder Content on Social Media
J. Feldman
17
0
0
06 Jul 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
16
113
0
18 May 2023
Parameter-Efficient Fine-Tuning for Medical Image Analysis: The Missed Opportunity
Raman Dutt
Linus Ericsson
Pedro Sanchez
Sotirios A. Tsaftaris
Timothy M. Hospedales
MedIm
13
50
0
14 May 2023
Surgical Fine-Tuning Improves Adaptation to Distribution Shifts
Yoonho Lee
Annie S. Chen
Fahim Tajwar
Ananya Kumar
Huaxiu Yao
Percy Liang
Chelsea Finn
OOD
38
195
0
20 Oct 2022
ViT-DD: Multi-Task Vision Transformer for Semi-Supervised Driver Distraction Detection
Yunsheng Ma
Ziran Wang
ViT
33
13
0
19 Sep 2022
Dual Vision Transformer
Ting Yao
Yehao Li
Yingwei Pan
Yu Wang
Xiaoping Zhang
Tao Mei
ViT
141
75
0
11 Jul 2022
Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition
Mingkun Yang
Minghui Liao
Pu Lu
Jing Wang
Shenggao Zhu
Hualin Luo
Qingzhen Tian
X. Bai
SSL
27
55
0
01 Jul 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
258
7,412
0
11 Nov 2021
Non-deep Networks
Ankit Goyal
Alexey Bochkovskiy
Jia Deng
V. Koltun
103
69
0
14 Oct 2021
ResNet strikes back: An improved training procedure in timm
Ross Wightman
Hugo Touvron
Hervé Jégou
AI4TS
207
484
0
01 Oct 2021
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
239
2,592
0
04 May 2021
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
292
5,761
0
29 Apr 2021
Transformer in Transformer
Kai Han
An Xiao
Enhua Wu
Jianyuan Guo
Chunjing Xu
Yunhe Wang
ViT
282
1,518
0
27 Feb 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
263
3,604
0
24 Feb 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,764
0
24 Feb 2021
RepVGG: Making VGG-style ConvNets Great Again
Xiaohan Ding
X. Zhang
Ningning Ma
Jungong Han
Guiguang Ding
Jian-jun Sun
117
1,531
0
11 Jan 2021
1