ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.14899
  4. Cited By
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image
  Classification

CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

27 March 2021
Chun-Fu Chen
Quanfu Fan
Rameswar Panda
    ViT
ArXivPDFHTML

Papers citing "CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification"

50 / 175 papers shown
Title
A survey on deep learning in medical image registration: new
  technologies, uncertainty, evaluation metrics, and beyond
A survey on deep learning in medical image registration: new technologies, uncertainty, evaluation metrics, and beyond
Junyu Chen
Yihao Liu
Shuwen Wei
Zhangxing Bian
Shalini Subramanian
A. Carass
Jerry L. Prince
Yong Du
OOD
30
36
0
28 Jul 2023
Visual Prompt Flexible-Modal Face Anti-Spoofing
Visual Prompt Flexible-Modal Face Anti-Spoofing
Zitong Yu
Rizhao Cai
Yawen Cui
Ajian Liu
Changsheng Chen
30
6
0
26 Jul 2023
SCA-PVNet: Self-and-Cross Attention Based Aggregation of Point Cloud and
  Multi-View for 3D Object Retrieval
SCA-PVNet: Self-and-Cross Attention Based Aggregation of Point Cloud and Multi-View for 3D Object Retrieval
Dongyun Lin
Yi Cheng
Aiyuan Guo
Shangbo Mao
Yiqun Li
3DPC
11
8
0
20 Jul 2023
Random Position Adversarial Patch for Vision Transformers
Random Position Adversarial Patch for Vision Transformers
Mingzhen Shao
ViT
AAML
14
2
0
09 Jul 2023
All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment
All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment
Chunhui Zhang
Xin Sun
Li Liu
Yiqian Yang
Qiong Liu
Xiaoping Zhou
Yanfeng Wang
33
15
0
07 Jul 2023
RedMotion: Motion Prediction via Redundancy Reduction
RedMotion: Motion Prediction via Redundancy Reduction
Royden Wagner
Ömer Sahin Tas
Marvin Klemp
Carlos Fernandez Lopez
Christoph Stiller
44
6
0
19 Jun 2023
MTLSegFormer: Multi-task Learning with Transformers for Semantic
  Segmentation in Precision Agriculture
MTLSegFormer: Multi-task Learning with Transformers for Semantic Segmentation in Precision Agriculture
D. Gonçalves
J. M. Junior
Pedro Zamboni
H. Pistori
Jonathan Li
Keiller Nogueira
W. Gonçalves
27
5
0
04 May 2023
RViDeformer: Efficient Raw Video Denoising Transformer with a Larger Benchmark Dataset
RViDeformer: Efficient Raw Video Denoising Transformer with a Larger Benchmark Dataset
Huanjing Yue
Cong Cao
Lei Liao
Jingyu Yang
ViT
39
6
0
01 May 2023
MASK-CNN-Transformer For Real-Time Multi-Label Weather Recognition
MASK-CNN-Transformer For Real-Time Multi-Label Weather Recognition
Shengchao Chen
Ting Shu
Huani Zhao
Yuan Yan Tang
ViT
32
14
0
28 Apr 2023
Fairness in Visual Clustering: A Novel Transformer Clustering Approach
Fairness in Visual Clustering: A Novel Transformer Clustering Approach
Xuan-Bac Nguyen
C. Duong
Marios Savvides
Kaushik Roy
Hugh Churchill
Khoa Luu
24
9
0
14 Apr 2023
Cross Attention Transformers for Multi-modal Unsupervised Whole-Body PET
  Anomaly Detection
Cross Attention Transformers for Multi-modal Unsupervised Whole-Body PET Anomaly Detection
Ashay Patel
Petru-Daniel Tudosiu
W. H. Pinaya
G. Cook
Vicky Goh
Sebastien Ourselin
M. Jorge Cardoso
OOD
ViT
MedIm
18
11
0
14 Apr 2023
Towards Evaluating Explanations of Vision Transformers for Medical
  Imaging
Towards Evaluating Explanations of Vision Transformers for Medical Imaging
Piotr Komorowski
Hubert Baniecki
P. Biecek
MedIm
23
27
0
12 Apr 2023
HST-MRF: Heterogeneous Swin Transformer with Multi-Receptive Field for
  Medical Image Segmentation
HST-MRF: Heterogeneous Swin Transformer with Multi-Receptive Field for Medical Image Segmentation
Xiaofei Huang
Hongfang Gong
Jin Zhang
MedIm
13
2
0
10 Apr 2023
DIR-AS: Decoupling Individual Identification and Temporal Reasoning for
  Action Segmentation
DIR-AS: Decoupling Individual Identification and Temporal Reasoning for Action Segmentation
Peiyao Wang
Haibin Ling
12
2
0
04 Apr 2023
3Mformer: Multi-order Multi-mode Transformer for Skeletal Action
  Recognition
3Mformer: Multi-order Multi-mode Transformer for Skeletal Action Recognition
Lei Wang
Piotr Koniusz
ViT
21
45
0
25 Mar 2023
Quality evaluation of point clouds: a novel no-reference approach using
  transformer-based architecture
Quality evaluation of point clouds: a novel no-reference approach using transformer-based architecture
M. Tliba
A. Chetouani
G. Valenzise
Frederic Dufaux
3DPC
6
1
0
15 Mar 2023
Transformer Encoder with Multiscale Deep Learning for Pain
  Classification Using Physiological Signals
Transformer Encoder with Multiscale Deep Learning for Pain Classification Using Physiological Signals
Zhenyu Lu
Burcu Ozek
S. Kamarthi
ViT
MedIm
16
14
0
13 Mar 2023
Self-attention in Vision Transformers Performs Perceptual Grouping, Not
  Attention
Self-attention in Vision Transformers Performs Perceptual Grouping, Not Attention
Paria Mehrani
John K. Tsotsos
13
24
0
02 Mar 2023
Human MotionFormer: Transferring Human Motions with Vision Transformers
Human MotionFormer: Transferring Human Motions with Vision Transformers
Hongyu Liu
Xintong Han
Chengbin Jin
Lihui Qian
Huawei Wei
...
Faqiang Wang
Haoye Dong
Yibing Song
Jia Xu
Qifeng Chen
11
10
0
22 Feb 2023
Efficiency 360: Efficient Vision Transformers
Efficiency 360: Efficient Vision Transformers
Badri N. Patro
Vijay Srinivas Agneeswaran
19
6
0
16 Feb 2023
X-ReID: Cross-Instance Transformer for Identity-Level Person
  Re-Identification
X-ReID: Cross-Instance Transformer for Identity-Level Person Re-Identification
Leqi Shen
Tao He
Yuchen Guo
Guiguang Ding
29
5
0
04 Feb 2023
Out of Distribution Performance of State of Art Vision Model
Out of Distribution Performance of State of Art Vision Model
Salman Rahman
W. Lee
18
2
0
25 Jan 2023
Attribute-Guided Multi-Level Attention Network for Fine-Grained Fashion
  Retrieval
Attribute-Guided Multi-Level Attention Network for Fine-Grained Fashion Retrieval
L. Xiao
T. Yamasaki
AI4TS
16
2
0
27 Dec 2022
SMMix: Self-Motivated Image Mixing for Vision Transformers
SMMix: Self-Motivated Image Mixing for Vision Transformers
Mengzhao Chen
Mingbao Lin
Zhihang Lin
Yu-xin Zhang
Fei Chao
Rongrong Ji
31
10
0
26 Dec 2022
Rethinking Vision Transformers for MobileNet Size and Speed
Rethinking Vision Transformers for MobileNet Size and Speed
Yanyu Li
Ju Hu
Yang Wen
Georgios Evangelidis
Kamyar Salahi
Yanzhi Wang
Sergey Tulyakov
Jian Ren
ViT
21
157
0
15 Dec 2022
Vision Transformer Computation and Resilience for Dynamic Inference
Vision Transformer Computation and Resilience for Dynamic Inference
Kavya Sreedhar
Jason Clemons
Rangharajan Venkatesan
S. Keckler
M. Horowitz
13
2
0
06 Dec 2022
ObjCAViT: Improving Monocular Depth Estimation Using Natural Language
  Models And Image-Object Cross-Attention
ObjCAViT: Improving Monocular Depth Estimation Using Natural Language Models And Image-Object Cross-Attention
Dylan Auty
K. Mikolajczyk
VLM
10
3
0
30 Nov 2022
Bi-directional Feature Reconstruction Network for Fine-Grained Few-Shot
  Image Classification
Bi-directional Feature Reconstruction Network for Fine-Grained Few-Shot Image Classification
Jijie Wu
Dongliang Chang
Aneeshan Sain
Xiaoxu Li
Zhanyu Ma
Jie Cao
Jun Guo
Yi-Zhe Song
19
35
0
30 Nov 2022
Hierarchical Transformer for Survival Prediction Using Multimodality
  Whole Slide Images and Genomics
Hierarchical Transformer for Survival Prediction Using Multimodality Whole Slide Images and Genomics
Chunyuan Li
Xinliang Zhu
Jiawen Yao
Junzhou Huang
MedIm
20
11
0
29 Nov 2022
TetraDiffusion: Tetrahedral Diffusion Models for 3D Shape Generation
TetraDiffusion: Tetrahedral Diffusion Models for 3D Shape Generation
Nikolai Kalischek
T. Peters
Jan Dirk Wegner
Konrad Schindler
DiffM
19
12
0
23 Nov 2022
Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition
Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition
Qibin Hou
Cheng Lu
Mingg-Ming Cheng
Jiashi Feng
ViT
23
129
0
22 Nov 2022
Peeling the Onion: Hierarchical Reduction of Data Redundancy for
  Efficient Vision Transformer Training
Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training
Zhenglun Kong
Haoyu Ma
Geng Yuan
Mengshu Sun
Yanyue Xie
...
Tianlong Chen
Xiaolong Ma
Xiaohui Xie
Zhangyang Wang
Yanzhi Wang
ViT
26
22
0
19 Nov 2022
TORE: Token Reduction for Efficient Human Mesh Recovery with Transformer
TORE: Token Reduction for Efficient Human Mesh Recovery with Transformer
Zhiyang Dou
Qingxuan Wu
Chu-Hsing Lin
Zeyu Cao
Qiangqiang Wu
Weilin Wan
Taku Komura
Wenping Wang
24
39
0
19 Nov 2022
Decomposed Soft Prompt Guided Fusion Enhancing for Compositional
  Zero-Shot Learning
Decomposed Soft Prompt Guided Fusion Enhancing for Compositional Zero-Shot Learning
Xiaocheng Lu
Ziming Liu
Song Guo
Jingcai Guo
CoGe
10
30
0
19 Nov 2022
AU-Aware Vision Transformers for Biased Facial Expression Recognition
AU-Aware Vision Transformers for Biased Facial Expression Recognition
Shuyi Mao
Xinpeng Li
Q. Wu
Xiaojiang Peng
ViT
28
2
0
12 Nov 2022
ViTALiTy: Unifying Low-rank and Sparse Approximation for Vision
  Transformer Acceleration with a Linear Taylor Attention
ViTALiTy: Unifying Low-rank and Sparse Approximation for Vision Transformer Acceleration with a Linear Taylor Attention
Jyotikrishna Dass
Shang Wu
Huihong Shi
Chaojian Li
Zhifan Ye
Zhongfeng Wang
Yingyan Lin
15
49
0
09 Nov 2022
Attention Swin U-Net: Cross-Contextual Attention Mechanism for Skin
  Lesion Segmentation
Attention Swin U-Net: Cross-Contextual Attention Mechanism for Skin Lesion Segmentation
Ehsan Khodapanah Aghdam
Reza Azad
Maral Zarvani
Dorit Merhof
ViT
SSeg
MedIm
26
47
0
30 Oct 2022
Grafting Vision Transformers
Grafting Vision Transformers
Jong Sung Park
Kumara Kahatapitiya
Donghyun Kim
Shivchander Sudalairaj
Quanfu Fan
Michael S. Ryoo
ViT
18
2
0
28 Oct 2022
Iterative Patch Selection for High-Resolution Image Recognition
Iterative Patch Selection for High-Resolution Image Recognition
Benjamin Bergner
C. Lippert
Aravindh Mahendran
6
12
0
24 Oct 2022
LCPFormer: Towards Effective 3D Point Cloud Analysis via Local Context
  Propagation in Transformers
LCPFormer: Towards Effective 3D Point Cloud Analysis via Local Context Propagation in Transformers
Zhuo Huang
Zhiyou Zhao
Banghuai Li
Jungong Han
3DPC
ViT
23
55
0
23 Oct 2022
Face Pyramid Vision Transformer
Face Pyramid Vision Transformer
Khawar Islam
M. Zaheer
Arif Mahmood
ViT
CVBM
17
4
0
21 Oct 2022
Cross-Modal Fusion Distillation for Fine-Grained Sketch-Based Image
  Retrieval
Cross-Modal Fusion Distillation for Fine-Grained Sketch-Based Image Retrieval
Abhra Chaudhuri
Massimiliano Mancini
Yanbei Chen
Zeynep Akata
Anjan Dutta
16
5
0
19 Oct 2022
Multi-view Gait Recognition based on Siamese Vision Transformer
Multi-view Gait Recognition based on Siamese Vision Transformer
Yanchen Yang
Lijun Yun
Ruoyu Li
Feiyan Cheng
21
5
0
19 Oct 2022
SaiT: Sparse Vision Transformers through Adaptive Token Pruning
SaiT: Sparse Vision Transformers through Adaptive Token Pruning
Ling Li
D. Thorsley
Joseph Hassoun
ViT
25
17
0
11 Oct 2022
Coded Residual Transform for Generalizable Deep Metric Learning
Coded Residual Transform for Generalizable Deep Metric Learning
Shichao Kan
Yixiong Liang
Min Li
Yigang Cen
Jianxin Wang
Z. He
29
3
0
09 Oct 2022
The Lie Derivative for Measuring Learned Equivariance
The Lie Derivative for Measuring Learned Equivariance
Nate Gruver
Marc Finzi
Micah Goldblum
A. Wilson
14
34
0
06 Oct 2022
Unbiased Scene Graph Generation using Predicate Similarities
Unbiased Scene Graph Generation using Predicate Similarities
Misaki Ohashi
Yusuke Matsui
25
1
0
03 Oct 2022
Effective Vision Transformer Training: A Data-Centric Perspective
Effective Vision Transformer Training: A Data-Centric Perspective
Benjia Zhou
Pichao Wang
Jun Wan
Yan-Ni Liang
Fan Wang
24
5
0
29 Sep 2022
PPT: token-Pruned Pose Transformer for monocular and multi-view human
  pose estimation
PPT: token-Pruned Pose Transformer for monocular and multi-view human pose estimation
Haoyu Ma
Zhe Wang
Yifei Chen
Deying Kong
Liangjian Chen
Xingwei Liu
Xiangyi Yan
Hao Tang
Xiaohui Xie
ViT
35
47
0
16 Sep 2022
A patch-based architecture for multi-label classification from single
  label annotations
A patch-based architecture for multi-label classification from single label annotations
Warren Jouanneau
Aurélie Bugeau
Marc Palyart
Nicolas Papadakis
Laurent Vézard
20
0
0
14 Sep 2022
Previous
1234
Next