How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers

18 June 2021

Papers citing "How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers"

50 / 415 papers shown

Title
Learning Probabilistic Symmetrization for Architecture Agnostic Equivariance Jinwoo Kim Tien Dat Nguyen Ayhan Suleymanzade Hyeokjun An Seunghoon Hong 37 22 0 05 Jun 2023
Content-aware Token Sharing for Efficient Semantic Segmentation with Vision Transformers Chenyang Lu Daan de Geus Gijs Dubbelman ViT 15 20 0 03 Jun 2023
In or Out? Fixing ImageNet Out-of-Distribution Detection Evaluation Julian Bitterwolf Maximilian Müller Matthias Hein OODD 11 83 0 01 Jun 2023
Diffused Redundancy in Pre-trained Representations Vedant Nanda Till Speicher John P. Dickerson S. Feizi Krishna P. Gummadi Adrian Weller SSL 16 2 0 31 May 2023
Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers Hongjie Wang Bhishma Dedhia N. Jha ViT VLM 28 25 0 27 May 2023
Sharpness-Aware Minimization Leads to Low-Rank Features Maksym Andriushchenko Dara Bahri H. Mobahi Nicolas Flammarion AAML 25 25 0 25 May 2023
VanillaKD: Revisit the Power of Vanilla Knowledge Distillation from Small Scale to Large Scale Zhiwei Hao Jianyuan Guo Kai Han Han Hu Chang Xu Yunhe Wang 17 14 0 25 May 2023
Weakly-Supervised Learning of Visual Relations in Multimodal Pretraining Emanuele Bugliarello Aida Nematzadeh Lisa Anne Hendricks SSL 22 5 0 23 May 2023
Target-Aware Generative Augmentations for Single-Shot Adaptation Kowshik Thopalli Rakshith Subramanyam P. Turaga Jayaraman J. Thiagarajan TTA 37 5 0 22 May 2023
Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design Ibrahim M. Alabdulmohsin Xiaohua Zhai Alexander Kolesnikov Lucas Beyer VLM 19 54 0 22 May 2023
Multimodal Web Navigation with Instruction-Finetuned Foundation Models Hiroki Furuta Kuang-Huei Lee Ofir Nachum Yutaka Matsuo Aleksandra Faust S. Gu Izzeddin Gur LM&Ro 14 90 0 19 May 2023
Measuring Progress in Fine-grained Vision-and-Language Understanding Emanuele Bugliarello Laurent Sartran Aishwarya Agrawal Lisa Anne Hendricks Aida Nematzadeh VLM 20 22 0 12 May 2023
CrAFT: Compression-Aware Fine-Tuning for Efficient Visual Task Adaptation J. Heo S. Azizi A. Fayyazi Massoud Pedram 23 0 0 08 May 2023
Great Models Think Alike: Improving Model Reliability via Inter-Model Latent Agreement Ailin Deng Miao Xiong Bryan Hooi 17 6 0 02 May 2023
Modality-invariant Visual Odometry for Embodied Vision Marius Memmel Roman Bachmann Amir Zamir 54 8 0 29 Apr 2023
SoGAR: Self-supervised Spatiotemporal Attention-based Social Group Activity Recognition N. V. R. Chappa Pha Nguyen Alec Nelson Han-Seok Seo Xin Li P. Dobbs Khoa Luu ViT 26 8 0 27 Apr 2023
Hint-Aug: Drawing Hints from Foundation Vision Transformers Towards Boosted Few-Shot Parameter-Efficient Tuning Zhongzhi Yu Shang Wu Y. Fu Shunyao Zhang Yingyan Lin 21 6 0 25 Apr 2023
A Cookbook of Self-Supervised Learning Randall Balestriero Mark Ibrahim Vlad Sobal Ari S. Morcos Shashank Shekhar ... Pierre Fernandez Amir Bar Hamed Pirsiavash Yann LeCun Micah Goldblum SyDa FedML SSL 31 270 0 24 Apr 2023
End-to-End Spatio-Temporal Action Localisation with Video Transformers A. Gritsenko Xuehan Xiong Josip Djolonga Mostafa Dehghani Chen Sun Mario Lucic Cordelia Schmid Anurag Arnab ViT 32 13 0 24 Apr 2023
DINOv2: Learning Robust Visual Features without Supervision Maxime Oquab Timothée Darcet Théo Moutakanni Huy Q. Vo Marc Szafraniec ... Hervé Jégou Julien Mairal Patrick Labatut Armand Joulin Piotr Bojanowski VLM CLIP SSL 23 2,983 0 14 Apr 2023
ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification Mohammad Reza Taesiri Giang Nguyen Sarra Habchi C. Bezemer Anh Totti Nguyen VLM 19 20 0 11 Apr 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review Li Shen Yan Sun Zhiyuan Yu Liang Ding Xinmei Tian Dacheng Tao VLM 22 39 0 07 Apr 2023
Linking Representations with Multimodal Contrastive Learning Abhishek Arora Xinmei Yang Shao-Yu Jheng Melissa Dell 19 1 0 07 Apr 2023
ERM++: An Improved Baseline for Domain Generalization Piotr Teterwak Kuniaki Saito Theodoros Tsiligkaridis Kate Saenko Bryan A. Plummer OOD 18 9 0 04 Apr 2023
WeakTr: Exploring Plain Vision Transformer for Weakly-supervised Semantic Segmentation Liang Zhu Yingyue Li Jiemin Fang Yan Liu Hao Xin Wenyu Liu Xinggang Wang ViT 15 27 0 03 Apr 2023
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision Lucas Beyer Bo Wan Gagan Madan Filip Pavetić Andreas Steiner ... Emanuele Bugliarello Xiao Wang Qihang Yu Liang-Chieh Chen Xiaohua Zhai 46 8 0 30 Mar 2023
Towards Understanding the Effect of Pretraining Label Granularity Guanzhe Hong Yin Cui Ariel Fuxman Stanley H. Chan Enming Luo 11 2 0 29 Mar 2023
Sigmoid Loss for Language Image Pre-Training Xiaohua Zhai Basil Mustafa Alexander Kolesnikov Lucas Beyer CLIP VLM 14 917 0 27 Mar 2023
Vision Models Can Be Efficiently Specialized via Few-Shot Task-Aware Compression Denis Kuznedelev Soroush Tabesh Kimia Noorbakhsh Elias Frantar Sara Beery Eldar Kurtic Dan Alistarh MQ VLM 13 2 0 25 Mar 2023
Train/Test-Time Adaptation with Retrieval L. Zancato Alessandro Achille Tian Yu Liu Matthew Trager Pramuditha Perera Stefano Soatto TTA OOD 8 11 0 25 Mar 2023
A Closer Look at Model Adaptation using Feature Distortion and Simplicity Bias Puja Trivedi Danai Koutra Jayaraman J. Thiagarajan AAML 24 17 0 23 Mar 2023
The effectiveness of MAE pre-pretraining for billion-scale pretraining Mannat Singh Quentin Duval Kalyan Vasudev Alwala Haoqi Fan Vaibhav Aggarwal ... Piotr Dollár Christoph Feichtenhofer Ross B. Girshick Rohit Girdhar Ishan Misra LRM 102 62 0 23 Mar 2023
Instance-Conditioned GAN Data Augmentation for Representation Learning Pietro Astolfi Arantxa Casanova Jakob Verbeek Pascal Vincent Adriana Romero Soriano M. Drozdzal 11 6 0 16 Mar 2023
High-level Feature Guided Decoding for Semantic Segmentation Ye Huang Di Kang Shenghua Gao Wen Li Lixin Duan 18 0 0 15 Mar 2023
Efficiently Training Vision Transformers on Structural MRI Scans for Alzheimer's Disease Detection Nikhil J. Dhinagar Sophia I Thomopoulos Emily Laltoo Paul M. Thompson DiffM MedIm 37 16 0 14 Mar 2023
Revisiting Class-Incremental Learning with Pre-Trained Models: Generalizability and Adaptivity are All You Need Da-Wei Zhou Han-Jia Ye De-Chuan Zhan Ziwei Liu CLL 28 99 0 13 Mar 2023
Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks Jierun Chen Shiu-hong Kao Hao He Weipeng Zhuo Song Wen Chul-Ho Lee Shueng-Han Gary Chan OOD 27 679 0 07 Mar 2023
SPARTAN: Self-supervised Spatiotemporal Transformers Approach to Group Activity Recognition N. V. R. Chappa Pha Nguyen Alec Nelson Han-Seok Seo Xin Li P. Dobbs Khoa Luu ViT 40 15 0 06 Mar 2023
Training-Free Acceleration of ViTs with Delayed Spatial Merging J. Heo Seyedarmin Azizi A. Fayyazi Massoud Pedram 33 3 0 04 Mar 2023
Data-Efficient Training of CNNs and Transformers with Coresets: A Stability Perspective Animesh Gupta Irtiza Hassan Dilip K. Prasad D. K. Gupta 13 2 0 03 Mar 2023
Dropout Reduces Underfitting Zhuang Liu Zhi-Qin John Xu Joseph Jin Zhiqiang Shen Trevor Darrell 21 35 0 02 Mar 2023
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning Antoine Yang Arsha Nagrani Paul Hongsuck Seo Antoine Miech Jordi Pont-Tuset Ivan Laptev Josef Sivic Cordelia Schmid AI4TS VLM 18 219 0 27 Feb 2023
TBFormer: Two-Branch Transformer for Image Forgery Localization Yaqi Liu Binbin Lv Xin Jin Xiaoyue Chen Xiaokun Zhang ViT 18 25 0 25 Feb 2023
A framework for benchmarking class-out-of-distribution detection and its application to ImageNet Ido Galil Mohammed Dabbah Ran El-Yaniv UQCV 13 28 0 23 Feb 2023
What Can We Learn From The Selective Prediction And Uncertainty Estimation Performance Of 523 Imagenet Classifiers Ido Galil Mohammed Dabbah Ran El-Yaniv UQCV 19 24 0 23 Feb 2023
Steerable Equivariant Representation Learning Sangnie Bhardwaj Willie McClinton Tongzhou Wang Guillaume Lajoie Chen Sun Phillip Isola Dilip Krishnan OOD LLMSV 21 5 0 22 Feb 2023
Gradient-based Wang-Landau Algorithm: A Novel Sampler for Output Distribution of Neural Networks over the Input Space Weitang Liu Ying-Wai Li Yi-Zhuang You Jingbo Shang 6 1 0 19 Feb 2023
Conformers are All You Need for Visual Speech Recognition Oscar Chang H. Liao Dmitriy Serdyuk Ankit Parag Shah Olivier Siohan VLM 37 14 0 17 Feb 2023
Efficiency 360: Efficient Vision Transformers Badri N. Patro Vijay Srinivas Agneeswaran 19 6 0 16 Feb 2023
Tuning computer vision models with task rewards André Susano Pinto Alexander Kolesnikov Yuge Shi Lucas Beyer Xiaohua Zhai VLM 20 40 0 16 Feb 2023