ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.15808
  4. Cited By
CvT: Introducing Convolutions to Vision Transformers

CvT: Introducing Convolutions to Vision Transformers

IEEE International Conference on Computer Vision (ICCV), 2021
29 March 2021
Haiping Wu
Bin Xiao
Noel Codella
Xiyang Dai
Xiyang Dai
Lu Yuan
Lei Zhang
    ViT
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)Github (227★)

Papers citing "CvT: Introducing Convolutions to Vision Transformers"

50 / 860 papers shown
Title
Kolmogorov-Arnold Convolutions: Design Principles and Empirical Studies
Kolmogorov-Arnold Convolutions: Design Principles and Empirical Studies
Ivan Drokin
309
61
0
01 Jul 2024
Query-Efficient Hard-Label Black-Box Attack against Vision Transformers
Query-Efficient Hard-Label Black-Box Attack against Vision Transformers
Chao Zhou
Xiaowen Shi
Yuan-Gen Wang
ViTAAML
199
1
0
29 Jun 2024
Fibottention: Inceptive Visual Representation Learning with Diverse
  Attention Across Heads
Fibottention: Inceptive Visual Representation Learning with Diverse Attention Across Heads
Ali Khaleghi Rahimian
Manish Kumar Govind
Subhajit Maity
Dominick Reilly
Christian Kummerle
Srijan Das
A. Dutta
217
1
0
27 Jun 2024
Pamba: Enhancing Global Interaction in Point Clouds via State Space Model
Pamba: Enhancing Global Interaction in Point Clouds via State Space Model
Hao Sun
Yubo Ai
Jiahao Lu
Chuxin Wang
Jiacheng Deng
Hanzhi Chang
Yanzhe Liang
Wenfei Yang
Shifeng Zhang
Tianzhu Zhang
Mamba
153
0
0
25 Jun 2024
Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D
  Images and 3D Scenes
Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D Images and 3D Scenes
Qi Ma
Danda Pani Paudel
E. Konukoglu
Luc Van Gool
256
10
0
25 Jun 2024
A Primal-Dual Framework for Transformers and Neural Networks
A Primal-Dual Framework for Transformers and Neural Networks
Tan M. Nguyen
Tam Nguyen
Nhat Ho
Andrea L. Bertozzi
Richard G. Baraniuk
Stanley J. Osher
ViT
175
16
0
19 Jun 2024
Learning to Adapt Foundation Model DINOv2 for Capsule Endoscopy
  Diagnosis
Learning to Adapt Foundation Model DINOv2 for Capsule Endoscopy DiagnosisProcedia Computer Science (PCS), 2024
Bowen Zhang
Ying Chen
Long Bai
Yan Zhao
Yuxiang Sun
Yixuan Yuan
Jianhua Zhang
Hongliang Ren
288
11
0
15 Jun 2024
AdaNCA: Neural Cellular Automata As Adaptors For More Robust Vision
  Transformer
AdaNCA: Neural Cellular Automata As Adaptors For More Robust Vision Transformer
Yitao Xu
Tong Zhang
Sabine Süsstrunk
ViT
356
2
0
12 Jun 2024
Adaptively Bypassing Vision Transformer Blocks for Efficient Visual
  Tracking
Adaptively Bypassing Vision Transformer Blocks for Efficient Visual Tracking
Xiangyang Yang
Dan Zeng
Xucheng Wang
You Wu
Hengzhou Ye
Qijun Zhao
Shuiwang Li
285
15
0
12 Jun 2024
You Only Need Less Attention at Each Stage in Vision Transformers
You Only Need Less Attention at Each Stage in Vision Transformers
Shuoxi Zhang
Hanpeng Liu
Stephen Lin
Kun He
226
15
0
01 Jun 2024
Automatic Channel Pruning for Multi-Head Attention
Automatic Channel Pruning for Multi-Head Attention
Eunho Lee
Youngbae Hwang
ViT
239
2
0
31 May 2024
Optimizing Foundation Model Inference on a Many-tiny-core Open-source
  RISC-V Platform
Optimizing Foundation Model Inference on a Many-tiny-core Open-source RISC-V Platform
Viviane Potocnik
Luca Colagrande
Tim Fischer
L. Bertaccini
Daniele Jahier Pagliari
Luca Bompani
Luca Benini
278
4
0
29 May 2024
ViG: Linear-complexity Visual Sequence Learning with Gated Linear
  Attention
ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention
Bencheng Liao
Xinggang Wang
Lianghui Zhu
Qian Zhang
Chang Huang
295
8
0
28 May 2024
XFormParser: A Simple and Effective Multimodal Multilingual
  Semi-structured Form Parser
XFormParser: A Simple and Effective Multimodal Multilingual Semi-structured Form Parser
Xianfu Cheng
Hang Zhang
Zhiqiang Wang
Xiang Li
Weixiao Zhou
...
Fei Liu
Wei Zhang
Tao Sun
Tongliang Li
Zhoujun Li
228
4
0
27 May 2024
ETTrack: Enhanced Temporal Motion Predictor for Multi-Object Tracking
ETTrack: Enhanced Temporal Motion Predictor for Multi-Object Tracking
Xudong Han
Nobuyuki Oishi
Yueying Tian
Elif Ucurum
R. Young
C. Chatwin
Philip Birch
232
15
0
24 May 2024
YOLOv10: Real-Time End-to-End Object Detection
YOLOv10: Real-Time End-to-End Object DetectionNeural Information Processing Systems (NeurIPS), 2024
Ao Wang
Hui Chen
Lihao Liu
Kai Chen
Zijia Lin
Jungong Han
Guiguang Ding
3DH
278
2,980
0
23 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
856
164
0
23 May 2024
CSTA: CNN-based Spatiotemporal Attention for Video Summarization
CSTA: CNN-based Spatiotemporal Attention for Video Summarization
Jaewon Son
Jaehun Park
Kwangsu Kim
AI4TSViT
282
20
0
20 May 2024
Stereo-Knowledge Distillation from dpMV to Dual Pixels for Light Field
  Video Reconstruction
Stereo-Knowledge Distillation from dpMV to Dual Pixels for Light Field Video Reconstruction
Aryan Garg
Raghav Mallampali
Akshat Joshi
Shrisudhan Govindarajan
Kaushik Mitra
263
1
0
20 May 2024
GestFormer: Multiscale Wavelet Pooling Transformer Network for Dynamic
  Hand Gesture Recognition
GestFormer: Multiscale Wavelet Pooling Transformer Network for Dynamic Hand Gesture Recognition
Mallika Garg
Debashis Ghosh
P. M. Pradhan
SLRViT
225
14
0
18 May 2024
All in One Framework for Multimodal Re-identification in the Wild
All in One Framework for Multimodal Re-identification in the Wild
He Li
Mang Ye
Ming Zhang
Bo Du
269
27
0
08 May 2024
Examining Changes in Internal Representations of Continual Learning
  Models Through Tensor Decomposition
Examining Changes in Internal Representations of Continual Learning Models Through Tensor Decomposition
Nishant Suresh Aswani
Amira Guesmi
Muhammad Abdullah Hanif
Mohamed Bennai
CLL
168
1
0
06 May 2024
A separability-based approach to quantifying generalization: which layer
  is best?
A separability-based approach to quantifying generalization: which layer is best?
Luciano Dyballa
Evan Gerritz
Steven W. Zucker
OOD
325
4
0
02 May 2024
Fusing Depthwise and Pointwise Convolutions for Efficient Inference on
  GPUs
Fusing Depthwise and Pointwise Convolutions for Efficient Inference on GPUs
Fareed Qararyah
M. Azhar
Mohammad Ali Maleki
Pedro Trancoso
168
4
0
30 Apr 2024
ShadowMaskFormer: Mask Augmented Patch Embeddings for Shadow Removal
ShadowMaskFormer: Mask Augmented Patch Embeddings for Shadow Removal
Zhuohao Li
Guoyang Xie
Guannan Jiang
Zhichao Lu
327
8
0
29 Apr 2024
GLIMS: Attention-Guided Lightweight Multi-Scale Hybrid Network for
  Volumetric Semantic Segmentation
GLIMS: Attention-Guided Lightweight Multi-Scale Hybrid Network for Volumetric Semantic Segmentation
Z. A. Yazici
Ilkay Oksuz
H. K. Ekenel
MedIm
219
12
0
27 Apr 2024
PromptCIR: Blind Compressed Image Restoration with Prompt Learning
PromptCIR: Blind Compressed Image Restoration with Prompt Learning
Bingchen Li
Xin Li
Yiting Lu
Ruoyu Feng
Mengxi Guo
Shijie Zhao
Li Zhang
Zhibo Chen
303
25
0
26 Apr 2024
MathNet: A Data-Centric Approach for Printed Mathematical Expression
  Recognition
MathNet: A Data-Centric Approach for Printed Mathematical Expression Recognition
Felix M. Schmitt-Koopmann
Elaine M. Huang
Hans-Peter Hutter
Thilo Stadelmann
Alireza Darvishy
165
10
0
21 Apr 2024
Nested-TNT: Hierarchical Vision Transformers with Multi-Scale Feature
  Processing
Nested-TNT: Hierarchical Vision Transformers with Multi-Scale Feature Processing
Yuang Liu
Zhiheng Qiu
Xiaokai Qin
ViT
242
0
0
20 Apr 2024
An Experimental Study on Exploring Strong Lightweight Vision
  Transformers via Masked Image Modeling Pre-Training
An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training
Jin Gao
Shubo Lin
Shaoru Wang
Yutong Kou
Zeming Li
Liang Li
Congxuan Zhang
Xiaoqin Zhang
Yizheng Wang
Weiming Hu
265
5
0
18 Apr 2024
Training Transformer Models by Wavelet Losses Improves Quantitative and
  Visual Performance in Single Image Super-Resolution
Training Transformer Models by Wavelet Losses Improves Quantitative and Visual Performance in Single Image Super-Resolution
Cansu Korkmaz
A. Murat Tekalp
ViT
302
19
0
17 Apr 2024
Weight Copy and Low-Rank Adaptation for Few-Shot Distillation of Vision
  Transformers
Weight Copy and Low-Rank Adaptation for Few-Shot Distillation of Vision Transformers
Diana-Nicoleta Grigore
Mariana-Iuliana Georgescu
J. A. Justo
T. Johansen
Andreea-Iuliana Ionescu
Radu Tudor Ionescu
302
1
0
14 Apr 2024
TSLANet: Rethinking Transformers for Time Series Representation Learning
TSLANet: Rethinking Transformers for Time Series Representation Learning
Emadeldeen Eldele
Mohamed Ragab
Zhenghua Chen
Ruibing Jin
Xiaoli Li
AI4TSAIFin
316
112
0
12 Apr 2024
Robust feature knowledge distillation for enhanced performance of
  lightweight crack segmentation models
Robust feature knowledge distillation for enhanced performance of lightweight crack segmentation models
Zhaohui Chen
Elyas Asadi Shamsabadi
Sheng Jiang
Luming Shen
Daniel Dias-da-Costa
161
2
0
09 Apr 2024
Using Few-Shot Learning to Classify Primary Lung Cancer and Other Malignancy with Lung Metastasis in Cytological Imaging via Endobronchial Ultrasound Procedures
Using Few-Shot Learning to Classify Primary Lung Cancer and Other Malignancy with Lung Metastasis in Cytological Imaging via Endobronchial Ultrasound Procedures
Ching-Kai Lin
Di-Chun Wei
Yun-Chien Cheng
315
0
0
09 Apr 2024
Lightweight Deep Learning for Resource-Constrained Environments: A
  Survey
Lightweight Deep Learning for Resource-Constrained Environments: A Survey
Hou-I Liu
Marco Galindo
Hongxia Xie
Lai-Kuan Wong
Hong-Han Shuai
Yung-Hui Li
Wen-Huang Cheng
335
146
0
08 Apr 2024
HSViT: Horizontally Scalable Vision Transformer
HSViT: Horizontally Scalable Vision Transformer
Chenhao Xu
Chang-Tsun Li
Chee Peng Lim
Douglas Creighton
ViT
211
6
0
08 Apr 2024
GvT: A Graph-based Vision Transformer with Talking-Heads Utilizing Sparsity, Trained from Scratch on Small Datasets
GvT: A Graph-based Vision Transformer with Talking-Heads Utilizing Sparsity, Trained from Scratch on Small Datasets
Dongjing Shan
guiqiang chen
ViT
308
1
0
07 Apr 2024
Learning Correlation Structures for Vision Transformers
Learning Correlation Structures for Vision Transformers
Manjin Kim
Paul Hongsuck Seo
Cordelia Schmid
Minsu Cho
ViT
285
25
0
05 Apr 2024
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
ViTamin: Designing Scalable Vision Models in the Vision-Language EraComputer Vision and Pattern Recognition (CVPR), 2024
Jienneg Chen
Qihang Yu
Xiaohui Shen
Yaoyao Liu
Liang-Chieh Chen
3DVVLM
397
50
0
02 Apr 2024
Structured Initialization for Attention in Vision Transformers
Structured Initialization for Attention in Vision Transformers
Jianqiao Zheng
Xueqian Li
Simon Lucey
ViT
258
2
0
01 Apr 2024
Improving Visual Recognition with Hyperbolical Visual Hierarchy Mapping
Improving Visual Recognition with Hyperbolical Visual Hierarchy Mapping
Hyeongjun Kwon
Jinhyun Jang
Jin-Hwa Kim
Kwonyoung Kim
Kwanghoon Sohn
322
8
0
01 Apr 2024
Harnessing The Power of Attention For Patch-Based Biomedical Image
  Classification
Harnessing The Power of Attention For Patch-Based Biomedical Image Classification
Gousia Habib
Shaima Qureshi
Malik Ishfaq
111
1
0
01 Apr 2024
IPT-V2: Efficient Image Processing Transformer using Hierarchical
  Attentions
IPT-V2: Efficient Image Processing Transformer using Hierarchical Attentions
Zhijun Tu
Kunpeng Du
Hanting Chen
Hai-lin Wang
Wei Li
Jie Hu
Yunhe Wang
ViT
267
9
0
31 Mar 2024
Enhancing Efficiency in Vision Transformer Networks: Design Techniques
  and Insights
Enhancing Efficiency in Vision Transformer Networks: Design Techniques and Insights
Moein Heidari
Reza Azad
Sina Ghorbani Kolahi
René Arimond
Leon Niggemeier
...
Afshin Bozorgpour
Ehsan Khodapanah Aghdam
Amirhossein Kazerouni
Ilker Hacihaliloglu
Dorit Merhof
290
14
0
28 Mar 2024
Heracles: A Hybrid SSM-Transformer Model for High-Resolution Image and
  Time-Series Analysis
Heracles: A Hybrid SSM-Transformer Model for High-Resolution Image and Time-Series Analysis
Badri N. Patro
Suhas Ranganath
Vinay P. Namboodiri
Vijay Srinivas Agneeswaran
275
4
0
26 Mar 2024
PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition
PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition
Chenhongyi Yang
Zehui Chen
Miguel Espinosa
Linus Ericsson
Zhenyu Wang
Jiaming Liu
Elliot J. Crowley
Mamba
318
162
0
26 Mar 2024
A Survey on Deep Learning and State-of-the-art Applications
A Survey on Deep Learning and State-of-the-art Applications
Mohd Halim Mohd Noor
A. O. Ige
AILawMLAU
186
0
0
26 Mar 2024
Exploring Dynamic Transformer for Efficient Object Tracking
Exploring Dynamic Transformer for Efficient Object Tracking
Jiawen Zhu
Xin Chen
Haiwen Diao
Shuai Li
Jun-Yan He
Chenyang Li
Bin Luo
Dong Wang
Huchuan Lu
372
12
0
26 Mar 2024
CFAT: Unleashing TriangularWindows for Image Super-resolution
CFAT: Unleashing TriangularWindows for Image Super-resolution
Abhisek Ray
Gaurav Kumar
M. Kolekar
SupR
263
33
0
24 Mar 2024
Previous
12345...161718
Next