Training data-efficient image transformers & distillation through attention

23 December 2020

Alexandre Sablayrolles

Hervé Jégou

ViT

ArXiv PDF HTML

Papers citing "Training data-efficient image transformers & distillation through attention"

50 / 1,093 papers shown

Title
Natural Color Fool: Towards Boosting Black-box Unrestricted Attacks Shengming Yuan Qilong Zhang Lianli Gao Yaya Cheng Jingkuan Song AAML 22 42 0 05 Oct 2022
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models Chenglin Yang Siyuan Qiao Qihang Yu Xiaoding Yuan Yukun Zhu Alan Yuille Hartwig Adam Liang-Chieh Chen ViT MoE 33 58 0 04 Oct 2022
Implicit Warping for Animation with Image Sets Arun Mallya Ting-Chun Wang Ming-Yu Liu VGen 116 41 0 04 Oct 2022
Bridged Transformer for Vision and Point Cloud 3D Object Detection Yikai Wang Tengqi Ye Lele Cao Wen-bing Huang Fuchun Sun Fengxiang He Dacheng Tao ViT 35 34 0 04 Oct 2022
Introducing Vision Transformer for Alzheimer's Disease classification task with 3D input Zilun Zhang Farzad Khalvati MedIm ViT 20 9 0 03 Oct 2022
Early or Late Fusion Matters: Efficient RGB-D Fusion in Vision Transformers for 3D Object Recognition Georgios Tziafas H. Kasaei ViT 37 10 0 03 Oct 2022
Learning Hierarchical Image Segmentation For Recognition and By Recognition Tsung-Wei Ke Sangwoo Mo Stella X. Yu VLM 29 9 0 01 Oct 2022
Where Should I Spend My FLOPS? Efficiency Evaluations of Visual Pre-training Methods Skanda Koppula Yazhe Li Evan Shelhamer Andrew Jaegle Nikhil Parthasarathy Relja Arandjelović João Carreira Olivier J. Hénaff 30 9 0 30 Sep 2022
Rethinking skip connection model as a learnable Markov chain Dengsheng Chen Jie Hu Wenwen Qiang Xiaoming Wei Enhua Wu BDL 14 1 0 30 Sep 2022
Dual Progressive Transformations for Weakly Supervised Semantic Segmentation Dong Huo Yukun Su Qingyao Wu ViT 23 4 0 30 Sep 2022
Effective Vision Transformer Training: A Data-Centric Perspective Benjia Zhou Pichao Wang Jun Wan Yan-Ni Liang Fan Wang 26 5 0 29 Sep 2022
Dilated Neighborhood Attention Transformer Ali Hassani Humphrey Shi ViT MedIm 28 68 0 29 Sep 2022
Attacking Compressed Vision Transformers Swapnil Parekh Devansh Shah Pratyush Shukla AAML 19 1 0 28 Sep 2022
FreeSeg: Free Mask from Interpretable Contrastive Language-Image Pretraining for Semantic Segmentation Yi Li Huifeng Yao Hualiang Wang X. Li ISeg VLM 35 2 0 27 Sep 2022
Greybox XAI: a Neural-Symbolic learning framework to produce interpretable predictions for image classification Adrien Bennetot Gianni Franchi Javier Del Ser Raja Chatila Natalia Díaz Rodríguez AAML 25 29 0 26 Sep 2022
From One to Many: Dynamic Cross Attention Networks for LiDAR and Camera Fusion Rui Wan Shuangjie Xu Wei Wu Xiaoyi Zou Tongyi Cao 3DPC 14 4 0 25 Sep 2022
Self-Supervised Masked Convolutional Transformer Block for Anomaly Detection Neelu Madan Nicolae-Cătălin Ristea Radu Tudor Ionescu Kamal Nasrollahi F. Khan T. Moeslund M. Shah ViT MedIm 258 61 0 25 Sep 2022
Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration Marcos V. Conde Ui-Jin Choi Maxime Burchi Radu Timofte ViT 49 135 0 22 Sep 2022
Benchmarking and Analyzing 3D Human Pose and Shape Estimation Beyond Algorithms Huijuan Pang Zhongang Cai Lei Yang Tianwei Zhang Ziwei Liu 3DH 49 28 0 21 Sep 2022
PicT: A Slim Weakly Supervised Vision Transformer for Pavement Distress Classification Wenhao Tang Shengyue Huang Xiaoxian Zhang Luwen Huangfu ViT 37 2 0 21 Sep 2022
An Efficient End-to-End Transformer with Progressive Tri-modal Attention for Multi-modal Emotion Recognition Yang Wu Pai Peng Zhenyu Zhang Yanyan Zhao Bing Qin 17 1 0 20 Sep 2022
Dynamic Graph Message Passing Networks for Visual Recognition Li Zhang Mohan Chen Anurag Arnab Xiangyang Xue Philip H. S. Torr GNN 29 1 0 20 Sep 2022
Graph Reasoning Transformer for Image Parsing Dong Zhang Jinhui Tang Kwang-Ting Cheng ViT 24 16 0 20 Sep 2022
TODE-Trans: Transparent Object Depth Estimation with Transformer Kan Chen Shaochen Wang Beihao Xia Dongxu Li Zheng Kan Bin Li ViT 19 15 0 18 Sep 2022
PPT: token-Pruned Pose Transformer for monocular and multi-view human pose estimation Haoyu Ma Zhe Wang Yifei Chen Deying Kong Liangjian Chen Xingwei Liu Xiangyi Yan Hao Tang Xiaohui Xie ViT 35 47 0 16 Sep 2022
A Light Recipe to Train Robust Vision Transformers Edoardo Debenedetti Vikash Sehwag Prateek Mittal ViT 26 68 0 15 Sep 2022
Beat Transformer: Demixed Beat and Downbeat Tracking with Dilated Self-Attention Jingwei Zhao Gus Xia Ye Wang 21 18 0 15 Sep 2022
On the interplay of adversarial robustness and architecture components: patches, convolution and attention Francesco Croce Matthias Hein 41 6 0 14 Sep 2022
Revisiting Crowd Counting: State-of-the-art, Trends, and Future Perspectives Muhammad Asif Khan Hamid Menouar R. Hamila HAI 31 56 0 14 Sep 2022
PSAQ-ViT V2: Towards Accurate and General Data-Free Quantization for Vision Transformers Zhikai Li Mengjuan Chen Junrui Xiao Qingyi Gu ViT MQ 43 33 0 13 Sep 2022
FP8 Formats for Deep Learning Paulius Micikevicius Dusan Stosic N. Burgess Marius Cornea Pradeep Dubey ... Naveen Mellempudi S. Oberman M. Shoeybi Michael Siu Hao Wu BDL VLM MQ 69 121 0 12 Sep 2022
Exploring Target Representations for Masked Autoencoders Xingbin Liu Jinghao Zhou Tao Kong Xianming Lin Rongrong Ji 81 50 0 08 Sep 2022
Transformer-CNN Cohort: Semi-supervised Semantic Segmentation by the Best of Both Students Xueye Zheng Yuan Luo Hao Wang Chong Fu Lin Wang ViT 36 17 0 06 Sep 2022
An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling Tsu-jui Fu Linjie Li Zhe Gan Kevin Qinghong Lin William Yang Wang Lijuan Wang Zicheng Liu VLM 19 63 0 04 Sep 2022
TokenCut: Segmenting Objects in Images and Videos with Self-supervised Transformer and Normalized Cut Yangtao Wang Xiaoke Shen Yuan. Yuan Yuming Du Maomao Li S. Hu James L. Crowley Dominique Vaufreydaz VOS ViT 15 76 0 01 Sep 2022
MRL: Learning to Mix with Attention and Convolutions Shlok Mohta Hisahiro Suganuma Yoshiki Tanaka 20 2 0 30 Aug 2022
SB-SSL: Slice-Based Self-Supervised Transformers for Knee Abnormality Classification from MRI Sara Atito Syed Muhammad Anwar Muhammad Awais Josef Kitler ViT MedIm 16 12 0 29 Aug 2022
An Access Control Method with Secret Key for Semantic Segmentation Models Teru Nagamori Ryota Iijima Hitoshi Kiya 24 0 0 28 Aug 2022
MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining Xiaoyi Dong Jianmin Bao Yinglin Zheng Ting Zhang Dongdong Chen ... Weiming Zhang Lu Yuan Dong Chen Fang Wen Nenghai Yu CLIP VLM 40 157 0 25 Aug 2022
Masked Autoencoders Enable Efficient Knowledge Distillers Yutong Bai Zeyu Wang Junfei Xiao Chen Wei Huiyu Wang Alan Yuille Yuyin Zhou Cihang Xie CLL 24 39 0 25 Aug 2022
Improved Zero-Shot Audio Tagging & Classification with Patchout Spectrogram Transformers Paul Primus Gerhard Widmer VLM 17 5 0 24 Aug 2022
Federated Self-Supervised Contrastive Learning and Masked Autoencoder for Dermatological Disease Diagnosis Yawen Wu Dewen Zeng Zhepeng Wang Yi Sheng Lei Yang A. James Yiyu Shi Jingtong Hu 18 7 0 24 Aug 2022
Efficient Attention-free Video Shift Transformers Adrian Bulat Brais Martínez Georgios Tzimiropoulos ViT 27 1 0 23 Aug 2022
How good are deep models in understanding the generated images? Ali Borji OOD 21 6 0 23 Aug 2022
A Unified Analysis of Mixed Sample Data Augmentation: A Loss Function Perspective Chanwoo Park Sangdoo Yun Sanghyuk Chun AAML 18 32 0 21 Aug 2022
A Multi-Head Model for Continual Learning via Out-of-Distribution Replay Gyuhak Kim Zixuan Ke Bin Liu VLM CLL OODD 15 29 0 20 Aug 2022
Exploring Adversarial Robustness of Vision Transformers in the Spectral Perspective Gihyun Kim Juyeop Kim Jong-Seok Lee AAML ViT 18 4 0 20 Aug 2022
Accelerating Vision Transformer Training via a Patch Sampling Schedule Bradley McDanel C. Huynh ViT 25 1 0 19 Aug 2022
Improved Image Classification with Token Fusion Keong-Hun Choi Jin-Woo Kim Yaolong Wang J. Ha ViT 19 0 0 19 Aug 2022
GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention Refinement Zhi-Qi Cheng Qianwen Dai Siyao Li Teruko Mitamura Alexander G. Hauptmann 16 34 0 18 Aug 2022