Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2103.15808
Cited By
CvT: Introducing Convolutions to Vision Transformers
IEEE International Conference on Computer Vision (ICCV), 2021
29 March 2021
Haiping Wu
Bin Xiao
Noel Codella
Xiyang Dai
Xiyang Dai
Lu Yuan
Lei Zhang
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Github (227★)
Papers citing
"CvT: Introducing Convolutions to Vision Transformers"
50 / 858 papers shown
Title
GFT: Gradient Focal Transformer
Boris Kriuk
Simranjit Kaur Gill
Shoaib Aslam
Amir Fakhrutdinov
168
0
0
14 Apr 2025
Multi-modal and Multi-view Fundus Image Fusion for Retinopathy Diagnosis via Multi-scale Cross-attention and Shifted Window Self-attention
Yonghao Huang
Leiting Chen
Chuan Zhou
155
0
0
12 Apr 2025
A Hybrid Fully Convolutional CNN-Transformer Model for Inherently Interpretable Disease Detection from Retinal Fundus Images
K. Djoumessi
Samuel Ofosu Mensah
Philipp Berens
ViT
MedIm
262
0
0
11 Apr 2025
HGFormer: Topology-Aware Vision Transformer with HyperGraph Learning
IEEE transactions on multimedia (TMM), 2025
Hao Wang
Shuo Zhang
Biao Leng
ViT
520
3
0
03 Apr 2025
Multi-Token Attention
O. Yu. Golovneva
Tianlu Wang
Jason Weston
Sainbayar Sukhbaatar
279
3
0
01 Apr 2025
Spectral-Adaptive Modulation Networks for Visual Perception
Guhnoo Yun
J. Yoo
Kijung Kim
Jeongho Lee
Paul Hongsuck Seo
Dong Hwan Kim
362
0
0
31 Mar 2025
Video-based Traffic Light Recognition by Rockchip RV1126 for Autonomous Driving
Miao Fan
Xuxu Kong
Shengtong Xu
Haoyi Xiong
Xiangzeng Liu
ViT
191
1
0
31 Mar 2025
Stack Transformer Based Spatial-Temporal Attention Model for Dynamic Sign Language and Fingerspelling Recognition
Koki Hirooka
Abu Saleh Musa Miah
Tatsuya Murakami
Yuto Akiba
Yong Seok Hwang
Jungpil Shin
SLR
165
0
0
21 Mar 2025
DVHGNN: Multi-Scale Dilated Vision HGNN for Efficient Vision Recognition
Computer Vision and Pattern Recognition (CVPR), 2025
Caoshuo Li
Tanzhe Li
Xiaobin Hu
Donghao Luo
Taisong Jin
214
4
0
19 Mar 2025
CLIP-Free, Label-Free, Zero-Shot Concept Bottleneck Models
Fawaz Sammani
Jonas Fischer
Nikos Deligiannis
VLM
169
0
0
14 Mar 2025
LEDiT: Your Length-Extrapolatable Diffusion Transformer without Positional Encoding
Shen Zhang
Yaning Tan
Yaning Tan
Zhaowei Chen
Linze Li
...
Shuheng Li
Zhenyu Zhao
Caihua Chen
Jiajun Liang
Yao Tang
298
1
0
06 Mar 2025
Transformers with Joint Tokens and Local-Global Attention for Efficient Human Pose Estimation
K. A. Kinfu
René Vidal
ViT
230
0
0
28 Feb 2025
Escaping The Big Data Paradigm in Self-Supervised Representation Learning
Carlos Vélez García
Miguel Cazorla
Jorge Pomares
175
0
0
25 Feb 2025
VesselSAM: Leveraging SAM for Aortic Vessel Segmentation with AtrousLoRA
Adnan Iltaf
Rayan Merghani Ahmed
Bin Li
Bin Li
Shoujun Zhou
435
1
0
25 Feb 2025
MedKAN: An Advanced Kolmogorov-Arnold Network for Medical Image Classification
Zhuoqin Yang
Jiansong Zhang
Xiaoling Luo
Zheng Lu
Linlin Shen
MedIm
252
6
0
25 Feb 2025
SparseFormer: Detecting Objects in HRW Shots via Sparse Vision Transformer
ACM Multimedia (MM), 2024
Wenxi Li
Yuchen Guo
Jilai Zheng
Haozhe Lin
Chao Ma
Lu Fang
Yunbo Wang
ViT
350
5
0
11 Feb 2025
MicroViT: A Vision Transformer with Low Complexity Self Attention for Edge Device
International Symposium on Circuits and Systems (ISCAS), 2025
Novendra Setyawan
Chi-Chia Sun
Mao-Hsiu Hsu
W. Kuo
Jun-Wei Hsieh
ViT
937
9
0
09 Feb 2025
PolaFormer: Polarity-aware Linear Attention for Vision Transformers
International Conference on Learning Representations (ICLR), 2025
Weikang Meng
Yadan Luo
Xin Li
Shihong Deng
Zheng Zhang
968
29
0
25 Jan 2025
Parallel Sequence Modeling via Generalized Spatial Propagation Network
Computer Vision and Pattern Recognition (CVPR), 2025
Hongjun Wang
Wonmin Byeon
Jiarui Xu
Liang Feng
Ka Chun Cheung
Xiaolong Wang
Kai Han
Jan Kautz
Sifei Liu
770
3
0
21 Jan 2025
Confident Pseudo-labeled Diffusion Augmentation for Canine Cardiomegaly Detection
Shiman Zhang
Lakshmikar R. Polamreddy
Youshan Zhang
MedIm
DiffM
232
1
0
13 Jan 2025
Image Classification with Deep Reinforcement Active Learning
Mingyuan Jiu
Xuguang Song
H. Sahbi
Shupan Li
Yan Chen
Wei Guo
Lihua Guo
Mingliang Xu
VLM
179
1
0
31 Dec 2024
Towards Simple and Provable Parameter-Free Adaptive Gradient Methods
Yuanzhe Tao
Huizhuo Yuan
Xun Zhou
Yuan Cao
Q. Gu
ODL
181
2
0
27 Dec 2024
Unity is Strength: Unifying Convolutional and Transformeral Features for Better Person Re-Identification
Yuhao Wang
Pingping Zhang
Xuehu Liu
Zhengzheng Tu
Huchuan Lu
213
7
0
23 Dec 2024
Model Decides How to Tokenize: Adaptive DNA Sequence Tokenization with MxDNA
Neural Information Processing Systems (NeurIPS), 2024
Lifeng Qiao
Peng Ye
Yuchen Ren
Weiqiang Bai
Chaoqi Liang
Cheng Wang
Nanqing Dong
W. Ouyang
274
6
0
18 Dec 2024
Reversing the Damage: A QP-Aware Transformer-Diffusion Approach for 8K Video Restoration under Codec Compression
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Ali Mollaahmadi Dehaghi
Reza Razavi
Mohammad Moshirpour
265
3
0
12 Dec 2024
Cascaded Multi-Scale Attention for Enhanced Multi-Scale Feature Extraction and Interaction with Low-Resolution Images
Xiangyong Lu
Masanori Suganuma
Takayuki Okatani
429
2
0
03 Dec 2024
Multi-Token Enhancing for Vision Representation Learning
Zhong-Yu Li
Yu-Song Hu
Bo Yin
Ming-Ming Cheng
392
1
0
24 Nov 2024
FG-CXR: A Radiologist-Aligned Gaze Dataset for Enhancing Interpretability in Chest X-Ray Report Generation
Asian Conference on Computer Vision (ACCV), 2024
Trong-Thang Pham
Ngoc-Vuong Ho
Nhat-Tan Bui
T. Phan
Patel Brijesh
...
Gianfranco Doretto
Anh Nguyen
Carol C. Wu
Hien Nguyen
Ngan Le
311
7
0
23 Nov 2024
ReXrank: A Public Leaderboard for AI-Powered Radiology Report Generation
Xiaoman Zhang
Hong-Yu Zhou
Xiaoli Yang
Oishi Banerjee
J. N. Acosta
Josh Miller
Ouwen Huang
Pranav Rajpurkar
LM&MA
348
12
0
22 Nov 2024
D-Cube: Exploiting Hyper-Features of Diffusion Model for Robust Medical Classification
Industrial Conference on Data Mining (IDM), 2024
Minhee Jang
Juheon Son
Thanaporn Viriyasaranon
Junho Kim
Jang-Hwan Choi
MedIm
303
0
0
17 Nov 2024
SAG-ViT: A Scale-Aware, High-Fidelity Patching Approach with Graph Attention for Vision Transformers
Shravan Venkatraman
Jaskaran Singh Walia
J. Raheja
ViT
439
2
0
14 Nov 2024
Breaking the Low-Rank Dilemma of Linear Attention
Computer Vision and Pattern Recognition (CVPR), 2024
Qihang Fan
Huaibo Huang
Ran He
371
12
0
12 Nov 2024
CFPNet: Improving Lightweight ToF Depth Completion via Cross-zone Feature Propagation
International Conference on 3D Vision (3DV), 2024
Laiyan Ding
Hualie Jiang
Rui Xu
Rui Huang
489
4
0
07 Nov 2024
Reducing catastrophic forgetting of incremental learning in the absence of rehearsal memory with task-specific token
Young Jo Choi
Min Kyoon Yoo
Yu Rang Park
145
0
0
06 Nov 2024
HRPVT: High-Resolution Pyramid Vision Transformer for medium and small-scale human pose estimation
Zhoujie Xu
ViT
3DH
176
5
0
29 Oct 2024
NT-VOT211: A Large-Scale Benchmark for Night-time Visual Object Tracking
Asian Conference on Computer Vision (ACCV), 2024
Yu Liu
Arif Mahmood
Muhammad Haris Khan
188
5
0
27 Oct 2024
UTSRMorph: A Unified Transformer and Superresolution Network for Unsupervised Medical Image Registration
IEEE Transactions on Medical Imaging (IEEE TMI), 2024
Runshi Zhang
Hao Mo
Junchen Wang
Bimeng Jie
Yang He
Nenghao Jin
Liang Zhu
ViT
MedIm
135
9
0
27 Oct 2024
TEAM: Topological Evolution-aware Framework for Traffic Forecasting--Extended Version
Proceedings of the VLDB Endowment (PVLDB), 2024
Duc Kieu
Tung Kieu
Peng Han
Bin Yang
Christian S. Jensen
Bac Le
AI4TS
191
6
0
24 Oct 2024
FIPER: Factorized Features for Robust Image Super-Resolution and Compression
Yang-Che Sun
Cheng Yu Yeo
Ernie Chu
Jun-Cheng Chen
Yu-Lun Liu
SupR
499
0
0
23 Oct 2024
Frontiers in Intelligent Colonoscopy
Ge-Peng Ji
Jingyi Liu
Peng Xu
Nick Barnes
Fahad Shahbaz Khan
Salman Khan
Deng-Ping Fan
324
11
0
22 Oct 2024
Rethinking Transformer for Long Contextual Histopathology Whole Slide Image Analysis
Neural Information Processing Systems (NeurIPS), 2024
Honglin Li
Yunlong Zhang
Pingyi Chen
Honglin Li
Chenglu Zhu
Lin Yang
MedIm
249
12
0
18 Oct 2024
Improving Vision Transformers by Overlapping Heads in Multi-Head Self-Attention
Tianxiao Zhang
Bo Luo
G. Wang
ViT
162
2
0
18 Oct 2024
On Partial Prototype Collapse in the DINO Family of Self-Supervised Methods
British Machine Vision Conference (BMVC), 2024
Hariprasath Govindarajan
Per Sidén
Jacob Roll
Fredrik Lindsten
165
4
0
17 Oct 2024
CTA-Net: A CNN-Transformer Aggregation Network for Improving Multi-Scale Feature Extraction
Chunlei Meng
Jiacheng Yang
Wei Lin
Bowen Liu
Hongda Zhang
chun ouyang
Zhongxue Gan
ViT
229
3
0
15 Oct 2024
Locality Alignment Improves Vision-Language Models
International Conference on Learning Representations (ICLR), 2024
Ian Covert
Tony Sun
James Zou
Tatsunori Hashimoto
VLM
517
11
0
14 Oct 2024
BA-Net: Bridge Attention in Deep Neural Networks
Expert systems with applications (ESWA), 2024
Ronghui Zhang
Runzong Zou
Yue Zhao
Zirui Zhang
Junzhou Chen
Yue Cao
Chuan Hu
Houbing Song
165
2
0
10 Oct 2024
Guided Self-attention: Find the Generalized Necessarily Distinct Vectors for Grain Size Grading
Fang Gao
XueTao Li
Jiabao Wang
Shengheng Ma
Jun Yu
108
0
0
08 Oct 2024
Bridging Local and Global Knowledge via Transformer in Board Games
International Joint Conference on Artificial Intelligence (IJCAI), 2024
Tai-Lin Wu
Tai-Lin Wu
Chung-Chin Shih
Yan-Ru Ju
AAML
204
0
0
07 Oct 2024
SynCo: Synthetic Hard Negatives in Contrastive Learning for Better Unsupervised Visual Representations
Nikolaos Giakoumoglou
Tania Stathaki
SSL
441
1
0
03 Oct 2024
Beyond Skip Connection: Pooling and Unpooling Design for Elimination Singularities
AAAI Conference on Artificial Intelligence (AAAI), 2024
Chengkun Sun
Jinqian Pan
Juoli Jin
Russell Stevens Terry
Jiang Bian
Jie Xu
136
0
0
20 Sep 2024
Previous
1
2
3
4
5
...
16
17
18
Next