Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.14899
Cited By
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification
27 March 2021
Chun-Fu Chen
Quanfu Fan
Rameswar Panda
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification"
50 / 175 papers shown
Title
Multi-Feature Vision Transformer via Self-Supervised Representation Learning for Improvement of COVID-19 Diagnosis
Xiao Qi
D. Foran
J. Nosher
I. Hacihaliloglu
ViT
MedIm
9
3
0
03 Aug 2022
Making the Best of Both Worlds: A Domain-Oriented Transformer for Unsupervised Domain Adaptation
Wen-hui Ma
Jinming Zhang
Shuang Li
Chi Harold Liu
Yulin Wang
Wei Li
11
14
0
02 Aug 2022
Jigsaw-ViT: Learning Jigsaw Puzzles in Vision Transformer
Yingyi Chen
Xiaoke Shen
Yahui Liu
Qinghua Tao
Johan A. K. Suykens
AAML
ViT
21
22
0
25 Jul 2022
Vision Transformers: From Semantic Segmentation to Dense Prediction
Li Zhang
Jiachen Lu
Sixiao Zheng
Xinxuan Zhao
Xiatian Zhu
Yanwei Fu
Tao Xiang
Jianfeng Feng
Philip H. S. Torr
ViT
19
7
0
19 Jul 2022
HiFormer: Hierarchical Multi-scale Representations Using Transformers for Medical Image Segmentation
Moein Heidari
A. Kazerouni
Milad Soltany Kadarvish
Reza Azad
Ehsan Khodapanah Aghdam
Julien Cohen-Adad
Dorit Merhof
MedIm
ViT
25
171
0
18 Jul 2022
MSP-Former: Multi-Scale Projection Transformer for Single Image Desnowing
Sixiang Chen
Tian-Chun Ye
Yun-Peng Liu
Taodong Liao
Y. Ye
Erkang Chen
Peng Chen
ViT
18
51
0
12 Jul 2022
Dual Vision Transformer
Ting Yao
Yehao Li
Yingwei Pan
Yu Wang
Xiaoping Zhang
Tao Mei
ViT
141
75
0
11 Jul 2022
Learning Cross-Image Object Semantic Relation in Transformer for Few-Shot Fine-Grained Image Classification
Bo-Wen Zhang
Jiakang Yuan
Baopu Li
Tao Chen
Jiayuan Fan
Botian Shi
ViT
9
31
0
02 Jul 2022
CDNet: Contrastive Disentangled Network for Fine-Grained Image Categorization of Ocular B-Scan Ultrasound
Ruilong Dan
Yunxiang Li
Yijie Wang
Gangyong Jia
Ruiquan Ge
Juan Ye
Qun Jin
Yaqi Wang
21
8
0
17 Jun 2022
SP-ViT: Learning 2D Spatial Priors for Vision Transformers
Yuxuan Zhou
Wangmeng Xiang
C. Li
Biao Wang
Xihan Wei
Lei Zhang
M. Keuper
Xia Hua
ViT
19
15
0
15 Jun 2022
Multimodal Learning with Transformers: A Survey
P. Xu
Xiatian Zhu
David A. Clifton
ViT
41
522
0
13 Jun 2022
INDIGO: Intrinsic Multimodality for Domain Generalization
Puneet Mangla
Shivam Chandhok
Milan Aggarwal
V. Balasubramanian
Balaji Krishnamurthy
VLM
28
2
0
13 Jun 2022
HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling
Xiaosong Zhang
Yunjie Tian
Wei Huang
QiXiang Ye
Qi Dai
Lingxi Xie
Qi Tian
50
26
0
30 May 2022
Degradation-Aware Unfolding Half-Shuffle Transformer for Spectral Compressive Imaging
Yuanhao Cai
Jing Lin
Haoqian Wang
Xin Yuan
Henghui Ding
Yulun Zhang
Radu Timofte
Luc Van Gool
70
116
0
20 May 2022
Cross-Enhancement Transformer for Action Segmentation
Jiahui Wang
Zhenyou Wang
Shanna Zhuang
Hui Wang
ViT
46
23
0
19 May 2022
Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization
Luke Melas-Kyriazi
Christian Rupprecht
Iro Laina
Andrea Vedaldi
28
159
0
16 May 2022
Noise-reducing attention cross fusion learning transformer for histological image classification of osteosarcoma
Liangrui Pan
Hetian Wang
Lian-min Wang
Boya Ji
Mingting Liu
M. Chongcheawchamnan
Yuan Jin
Shaoliang Peng
ViT
MedIm
9
34
0
29 Apr 2022
Deeper Insights into the Robustness of ViTs towards Common Corruptions
Rui Tian
Zuxuan Wu
Qi Dai
Han Hu
Yu-Gang Jiang
ViT
AAML
11
4
0
26 Apr 2022
MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction
Yuanhao Cai
Jing Lin
Zudi Lin
Haoqian Wang
Yulun Zhang
Hanspeter Pfister
Radu Timofte
Luc Van Gool
19
170
0
17 Apr 2022
MiniViT: Compressing Vision Transformers with Weight Multiplexing
Jinnian Zhang
Houwen Peng
Kan Wu
Mengchen Liu
Bin Xiao
Jianlong Fu
Lu Yuan
ViT
11
123
0
14 Apr 2022
Multimodal Transformer for Nursing Activity Recognition
Momal Ijaz
Renato Diaz
C. L. P. Chen
ViT
22
26
0
09 Apr 2022
POSTER: A Pyramid Cross-Fusion Transformer Network for Facial Expression Recognition
Ce Zheng
Matías Mendieta
C. L. P. Chen
ViT
11
54
0
08 Apr 2022
Multi-scale Context-aware Network with Transformer for Gait Recognition
Duo-Lin Zhu
Xiaohui Huang
Xinggang Wang
Bo Yang
Botao He
Wenyu Liu
Bin Feng
ViT
CVBM
14
15
0
07 Apr 2022
Deep Hyperspectral Unmixing using Transformer Network
Preetam Ghosh
S. K. Roy
Bikram Koirala
Behnood Rasti
P. Scheunders
ViT
30
81
0
31 Mar 2022
Focal Modulation Networks
Jianwei Yang
Chunyuan Li
Xiyang Dai
Lu Yuan
Jianfeng Gao
3DPC
22
263
0
22 Mar 2022
HIPA: Hierarchical Patch Transformer for Single Image Super Resolution
Qing Cai
Yiming Qian
Jinxing Li
Junjie Lv
Yee-Hong Yang
Feng Wu
Dafan Zhang
17
28
0
19 Mar 2022
DFTR: Depth-supervised Fusion Transformer for Salient Object Detection
Heqin Zhu
Xu Sun
Yuexiang Li
Kai Ma
S. Kevin Zhou
Yefeng Zheng
ViT
31
9
0
12 Mar 2022
The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy
Tianlong Chen
Zhenyu (Allen) Zhang
Yu Cheng
Ahmed Hassan Awadallah
Zhangyang Wang
ViT
25
37
0
12 Mar 2022
Representation Compensation Networks for Continual Semantic Segmentation
Chang-Bin Zhang
Jianqiang Xiao
Xialei Liu
Ying-Cong Chen
Mingg-Ming Cheng
SSeg
CLL
13
93
0
10 Mar 2022
Joint rotational invariance and adversarial training of a dual-stream Transformer yields state of the art Brain-Score for Area V4
William Berrios
Arturo Deza
MedIm
ViT
12
13
0
08 Mar 2022
Recent Advances in Vision Transformer: A Survey and Outlook of Recent Work
Khawar Islam
ViT
24
44
0
03 Mar 2022
Learn From the Past: Experience Ensemble Knowledge Distillation
Chaofei Wang
Shaowei Zhang
S. Song
Gao Huang
17
4
0
25 Feb 2022
DynaMixer: A Vision MLP Architecture with Dynamic Mixing
Ziyu Wang
Wenhao Jiang
Yiming Zhu
Li Yuan
Yibing Song
Wei Liu
19
43
0
28 Jan 2022
Multiview Transformers for Video Recognition
Shen Yan
Xuehan Xiong
Anurag Arnab
Zhichao Lu
Mi Zhang
Chen Sun
Cordelia Schmid
ViT
24
211
0
12 Jan 2022
Pale Transformer: A General Vision Transformer Backbone with Pale-Shaped Attention
Sitong Wu
Tianyi Wu
Hao Hao Tan
G. Guo
ViT
23
70
0
28 Dec 2021
MPViT: Multi-Path Vision Transformer for Dense Prediction
Youngwan Lee
Jonghee Kim
Jeffrey Willette
Sung Ju Hwang
ViT
11
243
0
21 Dec 2021
FaceFormer: Speech-Driven 3D Facial Animation with Transformers
Yingruo Fan
Zhaojiang Lin
Jun Saito
Wenping Wang
Taku Komura
CVBM
38
194
0
10 Dec 2021
Learning Tracking Representations via Dual-Branch Fully Transformer Networks
Fei Xie
Chunyu Wang
Guangting Wang
Wankou Yang
Wenjun Zeng
ViT
12
47
0
05 Dec 2021
VT-CLIP: Enhancing Vision-Language Models with Visual-guided Texts
Longtian Qiu
Renrui Zhang
Ziyu Guo
Wei Zhang
Zilu Guo
Ziyao Zeng
Guangnan Zhang
VLM
CLIP
20
45
0
04 Dec 2021
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
Yanghao Li
Chaoxia Wu
Haoqi Fan
K. Mangalam
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
46
676
0
02 Dec 2021
Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks
Xizhou Zhu
Jinguo Zhu
Hao Li
Xiaoshi Wu
Xiaogang Wang
Hongsheng Li
Xiaohua Wang
Jifeng Dai
36
128
0
02 Dec 2021
SwinTrack: A Simple and Strong Baseline for Transformer Tracking
Liting Lin
Heng Fan
Zhipeng Zhang
Yong-mei Xu
Haibin Ling
ViT
10
301
0
02 Dec 2021
Shunted Self-Attention via Multi-Scale Token Aggregation
Sucheng Ren
Daquan Zhou
Shengfeng He
Jiashi Feng
Xinchao Wang
ViT
25
222
0
30 Nov 2021
SWAT: Spatial Structure Within and Among Tokens
Kumara Kahatapitiya
Michael S. Ryoo
20
6
0
26 Nov 2021
Exploiting Both Domain-specific and Invariant Knowledge via a Win-win Transformer for Unsupervised Domain Adaptation
Wen-hui Ma
Jinming Zhang
Shuang Li
Chi Harold Liu
Yulin Wang
Wei Li
ViT
16
11
0
25 Nov 2021
MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation
Wenhao Li
Hong Liu
H. Tang
Pichao Wang
Luc Van Gool
ViT
27
245
0
24 Nov 2021
Self-slimmed Vision Transformer
Zhuofan Zong
Kunchang Li
Guanglu Song
Yali Wang
Yu Qiao
B. Leng
Yu Liu
ViT
14
30
0
24 Nov 2021
An Image Patch is a Wave: Phase-Aware Vision MLP
Yehui Tang
Kai Han
Jianyuan Guo
Chang Xu
Yanxi Li
Chao Xu
Yunhe Wang
11
133
0
24 Nov 2021
PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer
Zitong Yu
Yuming Shen
Jingang Shi
Hengshuang Zhao
Philip H. S. Torr
Guoying Zhao
ViT
MedIm
129
166
0
23 Nov 2021
PointMixer: MLP-Mixer for Point Cloud Understanding
Jaesung Choe
Chunghyun Park
François Rameau
Jaesik Park
In So Kweon
3DPC
32
98
0
22 Nov 2021
Previous
1
2
3
4
Next