CvT: Introducing Convolutions to Vision Transformers

IEEE International Conference on Computer Vision (ICCV), 2021

29 March 2021

Lu Yuan

Lei Zhang

ViT

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)Github (227★)

Papers citing "CvT: Introducing Convolutions to Vision Transformers"

50 / 860 papers shown

ConDaFormer: Disassembled Transformer with Local Structure Enhancement for 3D Point Cloud Understanding

391

18 Dec 2023

Agent Attention: On the Integration of Softmax and Linear AttentionEuropean Conference on Computer Vision (ECCV), 2023

Gao Huang

381

193

14 Dec 2023

Transformer-based Selective Super-Resolution for Efficient Image Refinement

179

10 Dec 2023

Graph Convolutions Enrich the Self-Attention in Transformers!

Jeongwhan Choi

401

07 Dec 2023

Class-Discriminative Attention Maps for Vision Transformers

353

04 Dec 2023

MobileUtr: Revisiting the relationship between light-weight CNN and Transformer for efficient medical image segmentation

233

04 Dec 2023

SCHEME: Scalable Channel Mixer for Vision Transformers

Deepak Sridhar

Yunsheng Li

Nuno Vasconcelos

807

01 Dec 2023

TransNeXt: Robust Foveal Visual Perception for Vision TransformersComputer Vision and Pattern Recognition (CVPR), 2023

Dai Shi

ViT

296

261

28 Nov 2023

Beyond Visual Cues: Synchronously Exploring Target-Centric Semantics for Vision-Language Tracking

Bo Liu

374

28 Nov 2023

Advancing Vision Transformers with Group-Mix Attention

Ping Luo

328

26 Nov 2023

Pursing the Sparse Limitation of Spiking Deep Learning Structures

Jiahang Cao

Renjing Xu

206

18 Nov 2023

Vision Big Bird: Random Sparsification for Full Attention

Zhemin Zhang

Xun Gong

ViT

163

10 Nov 2023

Mini but Mighty: Finetuning ViTs with Mini AdaptersIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

Imad Eddine Marouf

Enzo Tartaglione

Stéphane Lathuilière

177

07 Nov 2023

GTP-ViT: Efficient Vision Transformers via Graph-based Token PropagationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

307

06 Nov 2023

Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation ProtocolsACM Computing Surveys (ACM Comput. Surv.), 2023

Iqra Qasim

Alexander Horsch

Dilip K. Prasad

254

05 Nov 2023

Scattering Vision Transformer: Spectral Mixing MattersNeural Information Processing Systems (NeurIPS), 2023

Badri N. Patro

Vijay Srinivas Agneeswaran

416

02 Nov 2023

Distilling Knowledge from CNN-Transformer Models for Enhanced Human Action RecognitionInternational Conference on Computer and Knowledge Engineering (ICCKE), 2023

Hamid Ahmadabadi

Omid Nejati Manzari

Ahmad Ayatollahi

126

02 Nov 2023

Limited Data, Unlimited Potential: A Study on ViTs Augmented by Masked AutoencodersIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

308

31 Oct 2023

MIST: Medical Image Segmentation Transformer with Convolutional Attention Mixing (CAM) DecoderIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

293

30 Oct 2023

ViR: Towards Efficient Vision Retention Backbones

175

30 Oct 2023

AViTMP: A Tracking-Specific Transformer for Single-Branch Visual TrackingIEEE Transactions on Intelligent Vehicles (TIV), 2023

372

30 Oct 2023

TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual RecognitionIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2023

Chuan Wu

Yizhou Yu

ViT

551

30 Oct 2023

Exploring Shape Embedding for Cloth-Changing Person Re-Identification via 2D-3D CorrespondencesACM Multimedia (ACM MM), 2023

243

27 Oct 2023

Generalizing to Unseen Domains in Diabetic Retinopathy ClassificationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

Chamuditha Jayanga Galappaththige

Gayal Kuruppu

Muhammad Haris Khan

OOD

327

26 Oct 2023

Bridging The Gaps Between Token Pruning and Full Pre-training via Masked Fine-tuning

Fengyuan Shi

Limin Wang

ViT

165

26 Oct 2023

Toward Flare-Free Images: A Survey

Yousef Kotp

Marwan Torki

262

22 Oct 2023

Multimodal Transformer Using Cross-Channel attention for Object Detection in Remote Sensing ImagesInternational Conference on Information Photonics (ICIP), 2023

Bissmella Bahaduri

Zuheng Ming

Fangchen Feng

Anissa Mokraou

261

21 Oct 2023

LeTFuser: Light-weight End-to-end Transformer-Based Sensor Fusion for Autonomous Driving with Multi-Task Learning

332

19 Oct 2023

Minimalist and High-Performance Semantic Segmentation with Plain Vision Transformers

250

19 Oct 2023

Camera-LiDAR Fusion with Latent Contact for Place Recognition in Challenging Cross-Scenes

263

16 Oct 2023

Accelerating Vision Transformers Based on Heterogeneous Attention Patterns

Errui Ding

Jingdong Wang

ViT

269

11 Oct 2023

Distance Weighted Trans Network for Image Completion

Xuelong Li

206

11 Oct 2023

Distilling Efficient Vision Transformers from CNNs for Semantic SegmentationPattern Recognition (Pattern Recogn.), 2023

Xueye Zheng

Yunhao Luo

Pengyuan Zhou

Lin Wang

220

11 Oct 2023

EViT: An Eagle Vision Transformer with Bi-Fovea Self-AttentionIEEE Transactions on Cybernetics (IEEE Trans. Cybern.), 2023

381

10 Oct 2023

No Token Left Behind: Efficient Vision Transformer via Dynamic Token IdlingApplied Informatics (AI), 2023

Xiaojun Chang

229

09 Oct 2023

Plug n' Play: Channel Shuffle Module for Enhancing Tiny Vision TransformersInternational Conference on Digital Image Computing: Techniques and Applications (DICTA), 2023

244

09 Oct 2023

Enhancing Representations through Heterogeneous Self-Supervised LearningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

362

08 Oct 2023

Low-Resolution Self-Attention for Semantic SegmentationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

471

08 Oct 2023

TiC: Exploring Vision Transformer in Convolution

Song Zhang

Qingzhong Wang

Jiang Bian

Haoyi Xiong

ViT

187

06 Oct 2023

ClusVPR: Efficient Visual Place Recognition with Clustering-based Weighted TransformerIEEE Transactions on Artificial Intelligence (IEEE TAI), 2023

Yifan Xu

Pourya Shamsolmoali

Jie Yang

ViT

257

06 Oct 2023

TransRadar: Adaptive-Directional Transformer for Real-Time Multi-View Radar Semantic SegmentationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

Yahia Dalbah

Jean Lahoud

Hisham Cholakkal

247

03 Oct 2023

Towards Training Without Depth Limits: Batch Normalization Without Gradient ExplosionInternational Conference on Learning Representations (ICLR), 2023

206

03 Oct 2023

Understanding Masked Autoencoders From a Local Contrastive Perspective

Wanli Ouyang

320

03 Oct 2023

PPT: Token Pruning and Pooling for Efficient Vision Transformers

278

03 Oct 2023

SeisT: A foundational deep learning model for earthquake monitoring tasksIEEE Transactions on Geoscience and Remote Sensing (TGRS), 2023

218

02 Oct 2023

Win-Win: Training High-Resolution Vision Transformers from Two WindowsInternational Conference on Learning Representations (ICLR), 2023

274

01 Oct 2023

RBFormer: Improve Adversarial Robustness of Transformer by Robust BiasBritish Machine Vision Conference (BMVC), 2023

Jiahang Cao

Renjing Xu

193

23 Sep 2023

Investigating Efficient Deep Learning Architectures For Side-Channel Attacks on AES

Yohai-Eliel Berreby

L. Sauvage

AAML

128

22 Sep 2023

CINFormer: Transformer network with multi-stage CNN feature injection for surface defect segmentation

Hao Liu

173

22 Sep 2023

RMT: Retentive Networks Meet Vision TransformersComputer Vision and Pattern Recognition (CVPR), 2023

600

172

20 Sep 2023