ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.15808
  4. Cited By
CvT: Introducing Convolutions to Vision Transformers

CvT: Introducing Convolutions to Vision Transformers

IEEE International Conference on Computer Vision (ICCV), 2021
29 March 2021
Haiping Wu
Bin Xiao
Noel Codella
Xiyang Dai
Xiyang Dai
Lu Yuan
Lei Zhang
    ViT
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)Github (227★)

Papers citing "CvT: Introducing Convolutions to Vision Transformers"

50 / 860 papers shown
SynCo: Synthetic Hard Negatives in Contrastive Learning for Better
  Unsupervised Visual Representations
SynCo: Synthetic Hard Negatives in Contrastive Learning for Better Unsupervised Visual Representations
Nikolaos Giakoumoglou
Tania Stathaki
SSL
544
7
0
03 Oct 2024
Beyond Skip Connection: Pooling and Unpooling Design for Elimination
  Singularities
Beyond Skip Connection: Pooling and Unpooling Design for Elimination SingularitiesAAAI Conference on Artificial Intelligence (AAAI), 2024
Chengkun Sun
Jinqian Pan
Juoli Jin
Russell Stevens Terry
Jiang Bian
Jie Xu
176
1
0
20 Sep 2024
RingMo-Aerial: An Aerial Remote Sensing Foundation Model With Affine Transformation Contrastive Learning
RingMo-Aerial: An Aerial Remote Sensing Foundation Model With Affine Transformation Contrastive LearningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Wenhui Diao
Haichen Yu
Kaiyue Kang
Tong Ling
Di Liu
...
Hanbo Bi
Libo Ren
Xuexue Li
Yongqiang Mao
Xian Sun
1.0K
9
0
20 Sep 2024
Sparks of Artificial General Intelligence(AGI) in Semiconductor Material
  Science: Early Explorations into the Next Frontier of Generative AI-Assisted
  Electron Micrograph Analysis
Sparks of Artificial General Intelligence(AGI) in Semiconductor Material Science: Early Explorations into the Next Frontier of Generative AI-Assisted Electron Micrograph Analysis
Sakhinana Sagar Srinivas
Geethan Sannidhi
Sreeja Gangasani
Chidaksh Ravuru
Venkataramana Runkana
229
0
0
17 Sep 2024
GLCONet: Learning Multi-source Perception Representation for Camouflaged
  Object Detection
GLCONet: Learning Multi-source Perception Representation for Camouflaged Object DetectionIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2024
Yanguang Sun
Hanyu Xuan
Zhiqiang Wang
Lei Luo
ObjD
188
21
0
15 Sep 2024
Domain-Invariant Representation Learning of Bird Sounds
Domain-Invariant Representation Learning of Bird Sounds
Ilyass Moummad
Romain Serizel
Emmanouil Benetos
Nicolas Farrugia
SSL
430
6
0
13 Sep 2024
SDformer: Efficient End-to-End Transformer for Depth Completion
SDformer: Efficient End-to-End Transformer for Depth Completion
Jian Qian
Miao Sun
Ashley Lee
Jie Li
Shenglong Zhuo
Patrick Chiang
ViTMDE
296
5
0
12 Sep 2024
ASSNet: Adaptive Semantic Segmentation Network for Microtumors and
  Multi-Organ Segmentation
ASSNet: Adaptive Semantic Segmentation Network for Microtumors and Multi-Organ Segmentation
Fuchen Zheng
Xinyi Chen
Xuhang Chen
Haolun Li
Xiaojiao Guo
Guoheng Huang
Chi-Man Pun
Shoujun Zhou
ViTMedIm
175
0
0
12 Sep 2024
Advancing Depth Anything Model for Unsupervised Monocular Depth Estimation in Endoscopy
Advancing Depth Anything Model for Unsupervised Monocular Depth Estimation in Endoscopy
Bojian Li
Bo Liu
Dan Si
Jinghua Yue
F. Zhou
MedImMDE
397
5
0
12 Sep 2024
PanAdapter: Two-Stage Fine-Tuning with Spatial-Spectral Priors Injecting
  for Pansharpening
PanAdapter: Two-Stage Fine-Tuning with Spatial-Spectral Priors Injecting for PansharpeningAAAI Conference on Artificial Intelligence (AAAI), 2024
Ruocheng Wu
ZiEn Zhang
ShangQi Deng
YuLe Duan
LiangJian Deng
175
4
0
11 Sep 2024
Brain-Inspired Stepwise Patch Merging for Vision Transformers
Brain-Inspired Stepwise Patch Merging for Vision TransformersInternational Joint Conference on Artificial Intelligence (IJCAI), 2024
Yonghao Yu
Dongcheng Zhao
Guobin Shen
Yiting Dong
Yi Zeng
385
0
0
11 Sep 2024
Exploring Rich Subjective Quality Information for Image Quality
  Assessment in the Wild
Exploring Rich Subjective Quality Information for Image Quality Assessment in the Wild
Xiongkuo Min
Yixuan Gao
Yuqin Cao
Guangtao Zhai
Wenjun Zhang
Huifang Sun
C. Chen
149
63
0
09 Sep 2024
UNIT: Unifying Image and Text Recognition in One Vision Encoder
UNIT: Unifying Image and Text Recognition in One Vision EncoderNeural Information Processing Systems (NeurIPS), 2024
Yi Zhu
Yanpeng Zhou
Chunwei Wang
Yang Cao
Jianhua Han
Lu Hou
Hang Xu
ViTVLM
311
9
0
06 Sep 2024
MVTN: A Multiscale Video Transformer Network for Hand Gesture
  Recognition
MVTN: A Multiscale Video Transformer Network for Hand Gesture Recognition
Mallika Garg
Debashis Ghosh
P. M. Pradhan
ViT
245
1
0
05 Sep 2024
TBConvL-Net: A Hybrid Deep Learning Architecture for Robust Medical
  Image Segmentation
TBConvL-Net: A Hybrid Deep Learning Architecture for Robust Medical Image SegmentationPattern Recognition (Pattern Recogn.), 2024
Shahzaib Iqbal
Tariq M. Khan
Syed S. Naqvi
Asim Naveed
Erik H. W. Meijering
MedIm
268
46
0
05 Sep 2024
Frequency-Spatial Entanglement Learning for Camouflaged Object Detection
Frequency-Spatial Entanglement Learning for Camouflaged Object DetectionEuropean Conference on Computer Vision (ECCV), 2024
Yanguang Sun
Chunyan Xu
Zhiqiang Wang
Hanyu Xuan
Lei Luo
273
69
0
03 Sep 2024
Dreaming is All You Need
Dreaming is All You Need
Mingze Ni
Wei Liu
140
0
0
03 Sep 2024
A Hybrid Transformer-Mamba Network for Single Image Deraining
A Hybrid Transformer-Mamba Network for Single Image Deraining
Shangquan Sun
Wenqi Ren
Juxiang Zhou
Jianhou Gan
Rui Wang
Xiaochun Cao
Mamba
335
17
0
31 Aug 2024
SMAFormer: Synergistic Multi-Attention Transformer for Medical Image Segmentation
SMAFormer: Synergistic Multi-Attention Transformer for Medical Image SegmentationIEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2024
Fuchen Zheng
Xuhang Chen
Weihuang Liu
Haolun Li
Yingtie Lei
Jiahui He
Chi-Man Pun
Shounjun Zhou
MedIm
751
44
0
31 Aug 2024
Parameter-Efficient Quantized Mixture-of-Experts Meets Vision-Language
  Instruction Tuning for Semiconductor Electron Micrograph Analysis
Parameter-Efficient Quantized Mixture-of-Experts Meets Vision-Language Instruction Tuning for Semiconductor Electron Micrograph Analysis
Sakhinana Sagar Srinivas
Chidaksh Ravuru
Geethan Sannidhi
Venkataramana Runkana
243
1
0
27 Aug 2024
Multi-Modal Instruction-Tuning Small-Scale Language-and-Vision Assistant
  for Semiconductor Electron Micrograph Analysis
Multi-Modal Instruction-Tuning Small-Scale Language-and-Vision Assistant for Semiconductor Electron Micrograph AnalysisAAAI Spring Symposia (SSS), 2024
Sakhinana Sagar Srinivas
Geethan Sannidhi
Venkataramana Runkana
270
1
0
27 Aug 2024
Hierarchical Network Fusion for Multi-Modal Electron Micrograph
  Representation Learning with Foundational Large Language Models
Hierarchical Network Fusion for Multi-Modal Electron Micrograph Representation Learning with Foundational Large Language Models
Sakhinana Sagar Srinivas
Geethan Sannidhi
Venkataramana Runkana
278
0
0
24 Aug 2024
Preliminary Investigations of a Multi-Faceted Robust and Synergistic
  Approach in Semiconductor Electron Micrograph Analysis: Integrating Vision
  Transformers with Large Language and Multimodal Models
Preliminary Investigations of a Multi-Faceted Robust and Synergistic Approach in Semiconductor Electron Micrograph Analysis: Integrating Vision Transformers with Large Language and Multimodal Models
Sakhinana Sagar Srinivas
Geethan Sannidhi
Sreeja Gangasani
Chidaksh Ravuru
Venkataramana Runkana
283
0
0
24 Aug 2024
Foundational Model for Electron Micrograph Analysis: Instruction-Tuning
  Small-Scale Language-and-Vision Assistant for Enterprise Adoption
Foundational Model for Electron Micrograph Analysis: Instruction-Tuning Small-Scale Language-and-Vision Assistant for Enterprise Adoption
Sakhinana Sagar Srinivas
Chidaksh Ravuru
Geethan Sannidhi
Venkataramana Runkana
156
0
0
23 Aug 2024
Vision HgNN: An Electron-Micrograph is Worth Hypergraph of Hypernodes
Vision HgNN: An Electron-Micrograph is Worth Hypergraph of Hypernodes
Sakhinana Sagar Srinivas
Rajat Kumar Sarkar
Sreeja Gangasani
Venkataramana Runkana
320
2
0
21 Aug 2024
sTransformer: A Modular Approach for Extracting Inter-Sequential and
  Temporal Information for Time-Series Forecasting
sTransformer: A Modular Approach for Extracting Inter-Sequential and Temporal Information for Time-Series Forecasting
Jiaheng Yin
Zhengxin Shi
Jianshen Zhang
Xiaomin Lin
Yulin Huang
Yongzhi Qi
Wei Qi
AI4TS
133
0
0
19 Aug 2024
MetaSeg: MetaFormer-based Global Contexts-aware Network for Efficient
  Semantic Segmentation
MetaSeg: MetaFormer-based Global Contexts-aware Network for Efficient Semantic SegmentationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Beoungwoo Kang
Seunghun Moon
Yubin Cho
Hyunwoo Yu
Suk-Ju Kang
ViTMedIm
270
25
0
14 Aug 2024
Advanced Vision Transformers and Open-Set Learning for Robust Mosquito
  Classification: A Novel Approach to Entomological Studies
Advanced Vision Transformers and Open-Set Learning for Robust Mosquito Classification: A Novel Approach to Entomological Studies
Ahmed Akib Jawad Karim
Muhammad Zawad Mahmud
Riasat Khan
121
4
0
12 Aug 2024
Efficient Visual Representation Learning with Heat Conduction Equation
Efficient Visual Representation Learning with Heat Conduction EquationInternational Joint Conference on Artificial Intelligence (IJCAI), 2024
Zhemin Zhang
Xun Gong
DiffM3DV
289
0
0
12 Aug 2024
MacFormer: Semantic Segmentation with Fine Object Boundaries
MacFormer: Semantic Segmentation with Fine Object Boundaries
Guoan Xu
Wenfeng Huang
Tao Wu
Ligeng Chen
Wenjing Jia
Guangwei Gao
Xiatian Zhu
Stuart W. Perry
283
4
0
11 Aug 2024
CAS-ViT: Convolutional Additive Self-attention Vision Transformers for
  Efficient Mobile Applications
CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications
Tianfang Zhang
Lei Li
Yang Zhou
Wentao Liu
Chen Qian
Xiangyang Ji
ViT
213
72
0
07 Aug 2024
Multi-label Sewer Pipe Defect Recognition with Mask Attention Feature
  Enhancement and Label Correlation Learning
Multi-label Sewer Pipe Defect Recognition with Mask Attention Feature Enhancement and Label Correlation Learning
Xin Zuo
Yu Sheng
Jifeng Shen
Yongwei Shan
173
0
0
01 Aug 2024
Depth-Wise Convolutions in Vision Transformers for Efficient Training on
  Small Datasets
Depth-Wise Convolutions in Vision Transformers for Efficient Training on Small Datasets
Tianxiao Zhang
Wenju Xu
Bo Luo
Guanghui Wang
ViTMDE
442
40
0
28 Jul 2024
A Survey on Cell Nuclei Instance Segmentation and Classification:
  Leveraging Context and Attention
A Survey on Cell Nuclei Instance Segmentation and Classification: Leveraging Context and Attention
João D. Nunes
D. Montezuma
Domingos Oliveira
Tania Pereira
Jaime S. Cardoso
274
0
0
26 Jul 2024
VSSD: Vision Mamba with Non-Causal State Space Duality
VSSD: Vision Mamba with Non-Causal State Space Duality
Yuheng Shi
Minjing Dong
Mingjia Li
Chang Xu
Mamba
350
3
0
26 Jul 2024
Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for
  Vision Transformers
Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers
Zhengang Li
Alec Lu
Yanyue Xie
Zhenglun Kong
Mengshu Sun
...
Zhaoyang Han
Caiwen Ding
Yanzhi Wang
Xue Lin
Zhenman Fang
243
10
0
25 Jul 2024
How Lightweight Can A Vision Transformer Be
How Lightweight Can A Vision Transformer Be
Jen Hong Tan
ViTMoE
222
1
0
25 Jul 2024
Embedding-Free Transformer with Inference Spatial Reduction for
  Efficient Semantic Segmentation
Embedding-Free Transformer with Inference Spatial Reduction for Efficient Semantic Segmentation
Hyunwoo Yu
Yubin Cho
Beoungwoo Kang
Seunghun Moon
Kyeongbo Kong
Suk-Ju Kang
246
12
0
24 Jul 2024
HERGen: Elevating Radiology Report Generation with Longitudinal Data
HERGen: Elevating Radiology Report Generation with Longitudinal Data
Fuying Wang
Shenghui Du
Lequan Yu
MedIm
266
20
0
21 Jul 2024
DuoFormer: Leveraging Hierarchical Visual Representations by Local and
  Global Attention
DuoFormer: Leveraging Hierarchical Visual Representations by Local and Global Attention
Xiaoya Tang
Bodong Zhang
Beatrice Knudsen
Tolga Tasdizen
ViTMedIm
296
4
0
18 Jul 2024
SegPoint: Segment Any Point Cloud via Large Language Model
SegPoint: Segment Any Point Cloud via Large Language Model
Shuting He
Henghui Ding
Xudong Jiang
Bihan Wen
3DVMLLM3DPC
254
37
0
18 Jul 2024
AFIDAF: Alternating Fourier and Image Domain Adaptive Filters as an
  Efficient Alternative to Attention in ViTs
AFIDAF: Alternating Fourier and Image Domain Adaptive Filters as an Efficient Alternative to Attention in ViTs
Yunling Zheng
Zeyi Xu
Fanghui Xue
Biao Yang
Jiancheng Lyu
Shuai Zhang
Y. Qi
Jack Xin
212
0
0
16 Jul 2024
TCFormer: Visual Recognition via Token Clustering Transformer
TCFormer: Visual Recognition via Token Clustering Transformer
Wang Zeng
Sheng Jin
Lumin Xu
Wentao Liu
Chao Qian
Wanli Ouyang
Ping Luo
Xiaogang Wang
198
15
0
16 Jul 2024
TractGraphFormer: Anatomically Informed Hybrid Graph CNN-Transformer Network for Interpretable Sex and Age Prediction from Diffusion MRI Tractography
TractGraphFormer: Anatomically Informed Hybrid Graph CNN-Transformer Network for Interpretable Sex and Age Prediction from Diffusion MRI Tractography
Yuqian Chen
Fan Zhang
Meng Wang
L. Zekelman
Suheyla Cetin Karayumak
...
J. Rushmore
N. Makris
Yogesh Rathi
Weidong Cai
L. O’Donnell
MedImViT
188
1
0
11 Jul 2024
Parameter Efficient Fine Tuning for Multi-scanner PET to PET
  Reconstruction
Parameter Efficient Fine Tuning for Multi-scanner PET to PET Reconstruction
Yumin Kim
Gayoon Choi
Seong Jae Hwang
175
1
0
10 Jul 2024
HAFormer: Unleashing the Power of Hierarchy-Aware Features for
  Lightweight Semantic Segmentation
HAFormer: Unleashing the Power of Hierarchy-Aware Features for Lightweight Semantic Segmentation
Guoan Xu
Wenjing Jia
Tao Wu
Ligeng Chen
Guangwei Gao
ViT
287
24
0
10 Jul 2024
iiANET: Inception Inspired Attention Hybrid Network for efficient Long-Range Dependency
iiANET: Inception Inspired Attention Hybrid Network for efficient Long-Range Dependency
Haruna Yunusa
Qin Shiyin
Abdulrahman Hamman Adama Chukkol
Isah Bello
A. Lawan
Isah Bello
289
4
0
10 Jul 2024
Fish-Vista: A Multi-Purpose Dataset for Understanding & Identification of Traits from Images
Fish-Vista: A Multi-Purpose Dataset for Understanding & Identification of Traits from Images
Kazi Sajeed Mehrab
M. Maruf
Arka Daw
Harish Babu Manogaran
Abhilash Neog
...
Paula Mabee
Wasila Dahdul
Anuj Karpatne
Wasila M Dahdul
Anuj Karpatne
459
8
0
10 Jul 2024
CTRL-F: Pairing Convolution with Transformer for Image Classification via Multi-Level Feature Cross-Attention and Representation Learning Fusion
CTRL-F: Pairing Convolution with Transformer for Image Classification via Multi-Level Feature Cross-Attention and Representation Learning Fusion
Hosam S. El-Assiouti
Hadeer El-Saadawy
M. Al-Berry
M. Tolba
ViT
241
0
0
09 Jul 2024
CBM: Curriculum by Masking
CBM: Curriculum by Masking
Andrei Jarca
Florinel-Alin Croitoru
Radu Tudor Ionescu
258
5
0
06 Jul 2024
Previous
123456...161718
Next
Page 3 of 18
Pageof 18