ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.14899
  4. Cited By
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image
  Classification

CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

27 March 2021
Chun-Fu Chen
Quanfu Fan
Rameswar Panda
    ViT
ArXivPDFHTML

Papers citing "CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification"

50 / 175 papers shown
Title
Multi-Feature Vision Transformer via Self-Supervised Representation
  Learning for Improvement of COVID-19 Diagnosis
Multi-Feature Vision Transformer via Self-Supervised Representation Learning for Improvement of COVID-19 Diagnosis
Xiao Qi
D. Foran
J. Nosher
I. Hacihaliloglu
ViT
MedIm
9
3
0
03 Aug 2022
Making the Best of Both Worlds: A Domain-Oriented Transformer for
  Unsupervised Domain Adaptation
Making the Best of Both Worlds: A Domain-Oriented Transformer for Unsupervised Domain Adaptation
Wen-hui Ma
Jinming Zhang
Shuang Li
Chi Harold Liu
Yulin Wang
Wei Li
11
14
0
02 Aug 2022
Jigsaw-ViT: Learning Jigsaw Puzzles in Vision Transformer
Jigsaw-ViT: Learning Jigsaw Puzzles in Vision Transformer
Yingyi Chen
Xiaoke Shen
Yahui Liu
Qinghua Tao
Johan A. K. Suykens
AAML
ViT
21
22
0
25 Jul 2022
Vision Transformers: From Semantic Segmentation to Dense Prediction
Vision Transformers: From Semantic Segmentation to Dense Prediction
Li Zhang
Jiachen Lu
Sixiao Zheng
Xinxuan Zhao
Xiatian Zhu
Yanwei Fu
Tao Xiang
Jianfeng Feng
Philip H. S. Torr
ViT
19
7
0
19 Jul 2022
HiFormer: Hierarchical Multi-scale Representations Using Transformers
  for Medical Image Segmentation
HiFormer: Hierarchical Multi-scale Representations Using Transformers for Medical Image Segmentation
Moein Heidari
A. Kazerouni
Milad Soltany Kadarvish
Reza Azad
Ehsan Khodapanah Aghdam
Julien Cohen-Adad
Dorit Merhof
MedIm
ViT
25
171
0
18 Jul 2022
MSP-Former: Multi-Scale Projection Transformer for Single Image
  Desnowing
MSP-Former: Multi-Scale Projection Transformer for Single Image Desnowing
Sixiang Chen
Tian-Chun Ye
Yun-Peng Liu
Taodong Liao
Y. Ye
Erkang Chen
Peng Chen
ViT
18
51
0
12 Jul 2022
Dual Vision Transformer
Dual Vision Transformer
Ting Yao
Yehao Li
Yingwei Pan
Yu Wang
Xiaoping Zhang
Tao Mei
ViT
141
75
0
11 Jul 2022
Learning Cross-Image Object Semantic Relation in Transformer for
  Few-Shot Fine-Grained Image Classification
Learning Cross-Image Object Semantic Relation in Transformer for Few-Shot Fine-Grained Image Classification
Bo-Wen Zhang
Jiakang Yuan
Baopu Li
Tao Chen
Jiayuan Fan
Botian Shi
ViT
9
31
0
02 Jul 2022
CDNet: Contrastive Disentangled Network for Fine-Grained Image
  Categorization of Ocular B-Scan Ultrasound
CDNet: Contrastive Disentangled Network for Fine-Grained Image Categorization of Ocular B-Scan Ultrasound
Ruilong Dan
Yunxiang Li
Yijie Wang
Gangyong Jia
Ruiquan Ge
Juan Ye
Qun Jin
Yaqi Wang
21
8
0
17 Jun 2022
SP-ViT: Learning 2D Spatial Priors for Vision Transformers
SP-ViT: Learning 2D Spatial Priors for Vision Transformers
Yuxuan Zhou
Wangmeng Xiang
C. Li
Biao Wang
Xihan Wei
Lei Zhang
M. Keuper
Xia Hua
ViT
19
15
0
15 Jun 2022
Multimodal Learning with Transformers: A Survey
Multimodal Learning with Transformers: A Survey
P. Xu
Xiatian Zhu
David A. Clifton
ViT
41
522
0
13 Jun 2022
INDIGO: Intrinsic Multimodality for Domain Generalization
INDIGO: Intrinsic Multimodality for Domain Generalization
Puneet Mangla
Shivam Chandhok
Milan Aggarwal
V. Balasubramanian
Balaji Krishnamurthy
VLM
28
2
0
13 Jun 2022
HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling
HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling
Xiaosong Zhang
Yunjie Tian
Wei Huang
QiXiang Ye
Qi Dai
Lingxi Xie
Qi Tian
50
26
0
30 May 2022
Degradation-Aware Unfolding Half-Shuffle Transformer for Spectral
  Compressive Imaging
Degradation-Aware Unfolding Half-Shuffle Transformer for Spectral Compressive Imaging
Yuanhao Cai
Jing Lin
Haoqian Wang
Xin Yuan
Henghui Ding
Yulun Zhang
Radu Timofte
Luc Van Gool
70
116
0
20 May 2022
Cross-Enhancement Transformer for Action Segmentation
Cross-Enhancement Transformer for Action Segmentation
Jiahui Wang
Zhenyou Wang
Shanna Zhuang
Hui Wang
ViT
46
23
0
19 May 2022
Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised
  Semantic Segmentation and Localization
Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization
Luke Melas-Kyriazi
Christian Rupprecht
Iro Laina
Andrea Vedaldi
28
159
0
16 May 2022
Noise-reducing attention cross fusion learning transformer for
  histological image classification of osteosarcoma
Noise-reducing attention cross fusion learning transformer for histological image classification of osteosarcoma
Liangrui Pan
Hetian Wang
Lian-min Wang
Boya Ji
Mingting Liu
M. Chongcheawchamnan
Yuan Jin
Shaoliang Peng
ViT
MedIm
9
34
0
29 Apr 2022
Deeper Insights into the Robustness of ViTs towards Common Corruptions
Deeper Insights into the Robustness of ViTs towards Common Corruptions
Rui Tian
Zuxuan Wu
Qi Dai
Han Hu
Yu-Gang Jiang
ViT
AAML
11
4
0
26 Apr 2022
MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral
  Reconstruction
MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction
Yuanhao Cai
Jing Lin
Zudi Lin
Haoqian Wang
Yulun Zhang
Hanspeter Pfister
Radu Timofte
Luc Van Gool
19
170
0
17 Apr 2022
MiniViT: Compressing Vision Transformers with Weight Multiplexing
MiniViT: Compressing Vision Transformers with Weight Multiplexing
Jinnian Zhang
Houwen Peng
Kan Wu
Mengchen Liu
Bin Xiao
Jianlong Fu
Lu Yuan
ViT
11
123
0
14 Apr 2022
Multimodal Transformer for Nursing Activity Recognition
Multimodal Transformer for Nursing Activity Recognition
Momal Ijaz
Renato Diaz
C. L. P. Chen
ViT
22
26
0
09 Apr 2022
POSTER: A Pyramid Cross-Fusion Transformer Network for Facial Expression
  Recognition
POSTER: A Pyramid Cross-Fusion Transformer Network for Facial Expression Recognition
Ce Zheng
Matías Mendieta
C. L. P. Chen
ViT
11
54
0
08 Apr 2022
Multi-scale Context-aware Network with Transformer for Gait Recognition
Multi-scale Context-aware Network with Transformer for Gait Recognition
Duo-Lin Zhu
Xiaohui Huang
Xinggang Wang
Bo Yang
Botao He
Wenyu Liu
Bin Feng
ViT
CVBM
14
15
0
07 Apr 2022
Deep Hyperspectral Unmixing using Transformer Network
Deep Hyperspectral Unmixing using Transformer Network
Preetam Ghosh
S. K. Roy
Bikram Koirala
Behnood Rasti
P. Scheunders
ViT
30
81
0
31 Mar 2022
Focal Modulation Networks
Focal Modulation Networks
Jianwei Yang
Chunyuan Li
Xiyang Dai
Lu Yuan
Jianfeng Gao
3DPC
22
263
0
22 Mar 2022
HIPA: Hierarchical Patch Transformer for Single Image Super Resolution
HIPA: Hierarchical Patch Transformer for Single Image Super Resolution
Qing Cai
Yiming Qian
Jinxing Li
Junjie Lv
Yee-Hong Yang
Feng Wu
Dafan Zhang
17
28
0
19 Mar 2022
DFTR: Depth-supervised Fusion Transformer for Salient Object Detection
DFTR: Depth-supervised Fusion Transformer for Salient Object Detection
Heqin Zhu
Xu Sun
Yuexiang Li
Kai Ma
S. Kevin Zhou
Yefeng Zheng
ViT
31
9
0
12 Mar 2022
The Principle of Diversity: Training Stronger Vision Transformers Calls
  for Reducing All Levels of Redundancy
The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy
Tianlong Chen
Zhenyu (Allen) Zhang
Yu Cheng
Ahmed Hassan Awadallah
Zhangyang Wang
ViT
25
37
0
12 Mar 2022
Representation Compensation Networks for Continual Semantic Segmentation
Representation Compensation Networks for Continual Semantic Segmentation
Chang-Bin Zhang
Jianqiang Xiao
Xialei Liu
Ying-Cong Chen
Mingg-Ming Cheng
SSeg
CLL
13
93
0
10 Mar 2022
Joint rotational invariance and adversarial training of a dual-stream
  Transformer yields state of the art Brain-Score for Area V4
Joint rotational invariance and adversarial training of a dual-stream Transformer yields state of the art Brain-Score for Area V4
William Berrios
Arturo Deza
MedIm
ViT
12
13
0
08 Mar 2022
Recent Advances in Vision Transformer: A Survey and Outlook of Recent
  Work
Recent Advances in Vision Transformer: A Survey and Outlook of Recent Work
Khawar Islam
ViT
24
44
0
03 Mar 2022
Learn From the Past: Experience Ensemble Knowledge Distillation
Learn From the Past: Experience Ensemble Knowledge Distillation
Chaofei Wang
Shaowei Zhang
S. Song
Gao Huang
17
4
0
25 Feb 2022
DynaMixer: A Vision MLP Architecture with Dynamic Mixing
DynaMixer: A Vision MLP Architecture with Dynamic Mixing
Ziyu Wang
Wenhao Jiang
Yiming Zhu
Li Yuan
Yibing Song
Wei Liu
19
43
0
28 Jan 2022
Multiview Transformers for Video Recognition
Multiview Transformers for Video Recognition
Shen Yan
Xuehan Xiong
Anurag Arnab
Zhichao Lu
Mi Zhang
Chen Sun
Cordelia Schmid
ViT
24
211
0
12 Jan 2022
Pale Transformer: A General Vision Transformer Backbone with Pale-Shaped
  Attention
Pale Transformer: A General Vision Transformer Backbone with Pale-Shaped Attention
Sitong Wu
Tianyi Wu
Hao Hao Tan
G. Guo
ViT
23
70
0
28 Dec 2021
MPViT: Multi-Path Vision Transformer for Dense Prediction
MPViT: Multi-Path Vision Transformer for Dense Prediction
Youngwan Lee
Jonghee Kim
Jeffrey Willette
Sung Ju Hwang
ViT
11
243
0
21 Dec 2021
FaceFormer: Speech-Driven 3D Facial Animation with Transformers
FaceFormer: Speech-Driven 3D Facial Animation with Transformers
Yingruo Fan
Zhaojiang Lin
Jun Saito
Wenping Wang
Taku Komura
CVBM
38
194
0
10 Dec 2021
Learning Tracking Representations via Dual-Branch Fully Transformer
  Networks
Learning Tracking Representations via Dual-Branch Fully Transformer Networks
Fei Xie
Chunyu Wang
Guangting Wang
Wankou Yang
Wenjun Zeng
ViT
12
47
0
05 Dec 2021
VT-CLIP: Enhancing Vision-Language Models with Visual-guided Texts
VT-CLIP: Enhancing Vision-Language Models with Visual-guided Texts
Longtian Qiu
Renrui Zhang
Ziyu Guo
Wei Zhang
Zilu Guo
Ziyao Zeng
Guangnan Zhang
VLM
CLIP
20
45
0
04 Dec 2021
MViTv2: Improved Multiscale Vision Transformers for Classification and
  Detection
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
Yanghao Li
Chaoxia Wu
Haoqi Fan
K. Mangalam
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
46
676
0
02 Dec 2021
Uni-Perceiver: Pre-training Unified Architecture for Generic Perception
  for Zero-shot and Few-shot Tasks
Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks
Xizhou Zhu
Jinguo Zhu
Hao Li
Xiaoshi Wu
Xiaogang Wang
Hongsheng Li
Xiaohua Wang
Jifeng Dai
36
128
0
02 Dec 2021
SwinTrack: A Simple and Strong Baseline for Transformer Tracking
SwinTrack: A Simple and Strong Baseline for Transformer Tracking
Liting Lin
Heng Fan
Zhipeng Zhang
Yong-mei Xu
Haibin Ling
ViT
10
301
0
02 Dec 2021
Shunted Self-Attention via Multi-Scale Token Aggregation
Shunted Self-Attention via Multi-Scale Token Aggregation
Sucheng Ren
Daquan Zhou
Shengfeng He
Jiashi Feng
Xinchao Wang
ViT
25
222
0
30 Nov 2021
SWAT: Spatial Structure Within and Among Tokens
SWAT: Spatial Structure Within and Among Tokens
Kumara Kahatapitiya
Michael S. Ryoo
20
6
0
26 Nov 2021
Exploiting Both Domain-specific and Invariant Knowledge via a Win-win
  Transformer for Unsupervised Domain Adaptation
Exploiting Both Domain-specific and Invariant Knowledge via a Win-win Transformer for Unsupervised Domain Adaptation
Wen-hui Ma
Jinming Zhang
Shuang Li
Chi Harold Liu
Yulin Wang
Wei Li
ViT
16
11
0
25 Nov 2021
MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation
MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation
Wenhao Li
Hong Liu
H. Tang
Pichao Wang
Luc Van Gool
ViT
27
245
0
24 Nov 2021
Self-slimmed Vision Transformer
Self-slimmed Vision Transformer
Zhuofan Zong
Kunchang Li
Guanglu Song
Yali Wang
Yu Qiao
B. Leng
Yu Liu
ViT
14
30
0
24 Nov 2021
An Image Patch is a Wave: Phase-Aware Vision MLP
An Image Patch is a Wave: Phase-Aware Vision MLP
Yehui Tang
Kai Han
Jianyuan Guo
Chang Xu
Yanxi Li
Chao Xu
Yunhe Wang
11
133
0
24 Nov 2021
PhysFormer: Facial Video-based Physiological Measurement with Temporal
  Difference Transformer
PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer
Zitong Yu
Yuming Shen
Jingang Shi
Hengshuang Zhao
Philip H. S. Torr
Guoying Zhao
ViT
MedIm
129
166
0
23 Nov 2021
PointMixer: MLP-Mixer for Point Cloud Understanding
PointMixer: MLP-Mixer for Point Cloud Understanding
Jaesung Choe
Chunghyun Park
François Rameau
Jaesik Park
In So Kweon
3DPC
32
98
0
22 Nov 2021
Previous
1234
Next