Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.10270
Cited By
How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers
18 June 2021
Andreas Steiner
Alexander Kolesnikov
Xiaohua Zhai
Ross Wightman
Jakob Uszkoreit
Lucas Beyer
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers"
50 / 415 papers shown
Title
Learning Probabilistic Symmetrization for Architecture Agnostic Equivariance
Jinwoo Kim
Tien Dat Nguyen
Ayhan Suleymanzade
Hyeokjun An
Seunghoon Hong
37
22
0
05 Jun 2023
Content-aware Token Sharing for Efficient Semantic Segmentation with Vision Transformers
Chenyang Lu
Daan de Geus
Gijs Dubbelman
ViT
15
20
0
03 Jun 2023
In or Out? Fixing ImageNet Out-of-Distribution Detection Evaluation
Julian Bitterwolf
Maximilian Müller
Matthias Hein
OODD
11
83
0
01 Jun 2023
Diffused Redundancy in Pre-trained Representations
Vedant Nanda
Till Speicher
John P. Dickerson
S. Feizi
Krishna P. Gummadi
Adrian Weller
SSL
16
2
0
31 May 2023
Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers
Hongjie Wang
Bhishma Dedhia
N. Jha
ViT
VLM
28
25
0
27 May 2023
Sharpness-Aware Minimization Leads to Low-Rank Features
Maksym Andriushchenko
Dara Bahri
H. Mobahi
Nicolas Flammarion
AAML
25
25
0
25 May 2023
VanillaKD: Revisit the Power of Vanilla Knowledge Distillation from Small Scale to Large Scale
Zhiwei Hao
Jianyuan Guo
Kai Han
Han Hu
Chang Xu
Yunhe Wang
17
14
0
25 May 2023
Weakly-Supervised Learning of Visual Relations in Multimodal Pretraining
Emanuele Bugliarello
Aida Nematzadeh
Lisa Anne Hendricks
SSL
22
5
0
23 May 2023
Target-Aware Generative Augmentations for Single-Shot Adaptation
Kowshik Thopalli
Rakshith Subramanyam
P. Turaga
Jayaraman J. Thiagarajan
TTA
37
5
0
22 May 2023
Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design
Ibrahim M. Alabdulmohsin
Xiaohua Zhai
Alexander Kolesnikov
Lucas Beyer
VLM
19
54
0
22 May 2023
Multimodal Web Navigation with Instruction-Finetuned Foundation Models
Hiroki Furuta
Kuang-Huei Lee
Ofir Nachum
Yutaka Matsuo
Aleksandra Faust
S. Gu
Izzeddin Gur
LM&Ro
14
90
0
19 May 2023
Measuring Progress in Fine-grained Vision-and-Language Understanding
Emanuele Bugliarello
Laurent Sartran
Aishwarya Agrawal
Lisa Anne Hendricks
Aida Nematzadeh
VLM
20
22
0
12 May 2023
CrAFT: Compression-Aware Fine-Tuning for Efficient Visual Task Adaptation
J. Heo
S. Azizi
A. Fayyazi
Massoud Pedram
23
0
0
08 May 2023
Great Models Think Alike: Improving Model Reliability via Inter-Model Latent Agreement
Ailin Deng
Miao Xiong
Bryan Hooi
17
6
0
02 May 2023
Modality-invariant Visual Odometry for Embodied Vision
Marius Memmel
Roman Bachmann
Amir Zamir
54
8
0
29 Apr 2023
SoGAR: Self-supervised Spatiotemporal Attention-based Social Group Activity Recognition
N. V. R. Chappa
Pha Nguyen
Alec Nelson
Han-Seok Seo
Xin Li
P. Dobbs
Khoa Luu
ViT
26
8
0
27 Apr 2023
Hint-Aug: Drawing Hints from Foundation Vision Transformers Towards Boosted Few-Shot Parameter-Efficient Tuning
Zhongzhi Yu
Shang Wu
Y. Fu
Shunyao Zhang
Yingyan Lin
21
6
0
25 Apr 2023
A Cookbook of Self-Supervised Learning
Randall Balestriero
Mark Ibrahim
Vlad Sobal
Ari S. Morcos
Shashank Shekhar
...
Pierre Fernandez
Amir Bar
Hamed Pirsiavash
Yann LeCun
Micah Goldblum
SyDa
FedML
SSL
31
270
0
24 Apr 2023
End-to-End Spatio-Temporal Action Localisation with Video Transformers
A. Gritsenko
Xuehan Xiong
Josip Djolonga
Mostafa Dehghani
Chen Sun
Mario Lucic
Cordelia Schmid
Anurag Arnab
ViT
32
13
0
24 Apr 2023
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
...
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
VLM
CLIP
SSL
23
2,983
0
14 Apr 2023
ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification
Mohammad Reza Taesiri
Giang Nguyen
Sarra Habchi
C. Bezemer
Anh Totti Nguyen
VLM
19
20
0
11 Apr 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
22
39
0
07 Apr 2023
Linking Representations with Multimodal Contrastive Learning
Abhishek Arora
Xinmei Yang
Shao-Yu Jheng
Melissa Dell
19
1
0
07 Apr 2023
ERM++: An Improved Baseline for Domain Generalization
Piotr Teterwak
Kuniaki Saito
Theodoros Tsiligkaridis
Kate Saenko
Bryan A. Plummer
OOD
18
9
0
04 Apr 2023
WeakTr: Exploring Plain Vision Transformer for Weakly-supervised Semantic Segmentation
Liang Zhu
Yingyue Li
Jiemin Fang
Yan Liu
Hao Xin
Wenyu Liu
Xinggang Wang
ViT
15
27
0
03 Apr 2023
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision
Lucas Beyer
Bo Wan
Gagan Madan
Filip Pavetić
Andreas Steiner
...
Emanuele Bugliarello
Xiao Wang
Qihang Yu
Liang-Chieh Chen
Xiaohua Zhai
46
8
0
30 Mar 2023
Towards Understanding the Effect of Pretraining Label Granularity
Guanzhe Hong
Yin Cui
Ariel Fuxman
Stanley H. Chan
Enming Luo
11
2
0
29 Mar 2023
Sigmoid Loss for Language Image Pre-Training
Xiaohua Zhai
Basil Mustafa
Alexander Kolesnikov
Lucas Beyer
CLIP
VLM
14
917
0
27 Mar 2023
Vision Models Can Be Efficiently Specialized via Few-Shot Task-Aware Compression
Denis Kuznedelev
Soroush Tabesh
Kimia Noorbakhsh
Elias Frantar
Sara Beery
Eldar Kurtic
Dan Alistarh
MQ
VLM
13
2
0
25 Mar 2023
Train/Test-Time Adaptation with Retrieval
L. Zancato
Alessandro Achille
Tian Yu Liu
Matthew Trager
Pramuditha Perera
Stefano Soatto
TTA
OOD
8
11
0
25 Mar 2023
A Closer Look at Model Adaptation using Feature Distortion and Simplicity Bias
Puja Trivedi
Danai Koutra
Jayaraman J. Thiagarajan
AAML
24
17
0
23 Mar 2023
The effectiveness of MAE pre-pretraining for billion-scale pretraining
Mannat Singh
Quentin Duval
Kalyan Vasudev Alwala
Haoqi Fan
Vaibhav Aggarwal
...
Piotr Dollár
Christoph Feichtenhofer
Ross B. Girshick
Rohit Girdhar
Ishan Misra
LRM
102
62
0
23 Mar 2023
Instance-Conditioned GAN Data Augmentation for Representation Learning
Pietro Astolfi
Arantxa Casanova
Jakob Verbeek
Pascal Vincent
Adriana Romero Soriano
M. Drozdzal
11
6
0
16 Mar 2023
High-level Feature Guided Decoding for Semantic Segmentation
Ye Huang
Di Kang
Shenghua Gao
Wen Li
Lixin Duan
18
0
0
15 Mar 2023
Efficiently Training Vision Transformers on Structural MRI Scans for Alzheimer's Disease Detection
Nikhil J. Dhinagar
Sophia I Thomopoulos
Emily Laltoo
Paul M. Thompson
DiffM
MedIm
37
16
0
14 Mar 2023
Revisiting Class-Incremental Learning with Pre-Trained Models: Generalizability and Adaptivity are All You Need
Da-Wei Zhou
Han-Jia Ye
De-Chuan Zhan
Ziwei Liu
CLL
28
99
0
13 Mar 2023
Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks
Jierun Chen
Shiu-hong Kao
Hao He
Weipeng Zhuo
Song Wen
Chul-Ho Lee
Shueng-Han Gary Chan
OOD
27
679
0
07 Mar 2023
SPARTAN: Self-supervised Spatiotemporal Transformers Approach to Group Activity Recognition
N. V. R. Chappa
Pha Nguyen
Alec Nelson
Han-Seok Seo
Xin Li
P. Dobbs
Khoa Luu
ViT
40
15
0
06 Mar 2023
Training-Free Acceleration of ViTs with Delayed Spatial Merging
J. Heo
Seyedarmin Azizi
A. Fayyazi
Massoud Pedram
33
3
0
04 Mar 2023
Data-Efficient Training of CNNs and Transformers with Coresets: A Stability Perspective
Animesh Gupta
Irtiza Hassan
Dilip K. Prasad
D. K. Gupta
13
2
0
03 Mar 2023
Dropout Reduces Underfitting
Zhuang Liu
Zhi-Qin John Xu
Joseph Jin
Zhiqiang Shen
Trevor Darrell
21
35
0
02 Mar 2023
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Antoine Yang
Arsha Nagrani
Paul Hongsuck Seo
Antoine Miech
Jordi Pont-Tuset
Ivan Laptev
Josef Sivic
Cordelia Schmid
AI4TS
VLM
18
219
0
27 Feb 2023
TBFormer: Two-Branch Transformer for Image Forgery Localization
Yaqi Liu
Binbin Lv
Xin Jin
Xiaoyue Chen
Xiaokun Zhang
ViT
18
25
0
25 Feb 2023
A framework for benchmarking class-out-of-distribution detection and its application to ImageNet
Ido Galil
Mohammed Dabbah
Ran El-Yaniv
UQCV
13
28
0
23 Feb 2023
What Can We Learn From The Selective Prediction And Uncertainty Estimation Performance Of 523 Imagenet Classifiers
Ido Galil
Mohammed Dabbah
Ran El-Yaniv
UQCV
19
24
0
23 Feb 2023
Steerable Equivariant Representation Learning
Sangnie Bhardwaj
Willie McClinton
Tongzhou Wang
Guillaume Lajoie
Chen Sun
Phillip Isola
Dilip Krishnan
OOD
LLMSV
21
5
0
22 Feb 2023
Gradient-based Wang-Landau Algorithm: A Novel Sampler for Output Distribution of Neural Networks over the Input Space
Weitang Liu
Ying-Wai Li
Yi-Zhuang You
Jingbo Shang
6
1
0
19 Feb 2023
Conformers are All You Need for Visual Speech Recognition
Oscar Chang
H. Liao
Dmitriy Serdyuk
Ankit Parag Shah
Olivier Siohan
VLM
37
14
0
17 Feb 2023
Efficiency 360: Efficient Vision Transformers
Badri N. Patro
Vijay Srinivas Agneeswaran
19
6
0
16 Feb 2023
Tuning computer vision models with task rewards
André Susano Pinto
Alexander Kolesnikov
Yuge Shi
Lucas Beyer
Xiaohua Zhai
VLM
20
40
0
16 Feb 2023
Previous
1
2
3
4
5
6
7
8
9
Next