ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.14899
  4. Cited By
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image
  Classification

CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

27 March 2021
Chun-Fu Chen
Quanfu Fan
Rameswar Panda
    ViT
ArXivPDFHTML

Papers citing "CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification"

50 / 134 papers shown
Title
Nonlinear Motion-Guided and Spatio-Temporal Aware Network for Unsupervised Event-Based Optical Flow
Nonlinear Motion-Guided and Spatio-Temporal Aware Network for Unsupervised Event-Based Optical Flow
Zuntao Liu
Hao Zhuang
Junjie Jiang
Yuhang Song
Zheng Fang
43
0
0
08 May 2025
Balancing Accuracy, Calibration, and Efficiency in Active Learning with Vision Transformers Under Label Noise
Balancing Accuracy, Calibration, and Efficiency in Active Learning with Vision Transformers Under Label Noise
Moseli Motsóehli
Hope Mogale
Kyungim Baek
38
0
0
07 May 2025
Image Recognition with Online Lightweight Vision Transformer: A Survey
Image Recognition with Online Lightweight Vision Transformer: A Survey
Zherui Zhang
Rongtao Xu
Jie Zhou
Changwei Wang
Xingtian Pei
...
Jiguang Zhang
Li Guo
Longxiang Gao
W. Xu
Shibiao Xu
ViT
60
0
0
06 May 2025
Token Coordinated Prompt Attention is Needed for Visual Prompting
Token Coordinated Prompt Attention is Needed for Visual Prompting
Zichen Liu
Xu Zou
Gang Hua
Jiahuan Zhou
26
0
0
05 May 2025
Multi-Scale Graph Learning for Anti-Sparse Downscaling
Multi-Scale Graph Learning for Anti-Sparse Downscaling
Yingda Fan
Runlong Yu
Janet R. Barclay
A. Appling
Yiming Sun
Yiqun Xie
Xiaowei Jia
AI4CE
18
0
0
03 May 2025
RadioFormer: A Multiple-Granularity Radio Map Estimation Transformer with 1\textpertenthousand Spatial Sampling
RadioFormer: A Multiple-Granularity Radio Map Estimation Transformer with 1\textpertenthousand Spatial Sampling
Zheng Fang
Kangjun Liu
Ke Chen
Qingyu Liu
J. Zhang
Lingyang Song
Yaowei Wang
36
0
0
27 Apr 2025
Towards Accurate and Interpretable Neuroblastoma Diagnosis via Contrastive Multi-scale Pathological Image Analysis
Towards Accurate and Interpretable Neuroblastoma Diagnosis via Contrastive Multi-scale Pathological Image Analysis
Zhu Zhu
Shuo Jiang
Jingyuan Zheng
Yawen Li
Yifei Chen
Manli Zhao
Weizhong Gu
Feiwei Qin
Jinhu Wang
Gang Yu
MedIm
33
0
0
18 Apr 2025
ID-Booth: Identity-consistent Face Generation with Diffusion Models
ID-Booth: Identity-consistent Face Generation with Diffusion Models
Darian Tomašević
Fadi Boutros
Chenhao Lin
Naser Damer
Vitomir Štruc
Peter Peer
DiffM
55
1
0
10 Apr 2025
HOTFormerLoc: Hierarchical Octree Transformer for Versatile Lidar Place Recognition Across Ground and Aerial Views
HOTFormerLoc: Hierarchical Octree Transformer for Versatile Lidar Place Recognition Across Ground and Aerial Views
Ethan Griffiths
Maryam Haghighat
Simon Denman
Clinton Fookes
Milad Ramezani
3DPC
57
0
0
11 Mar 2025
TSCnet: A Text-driven Semantic-level Controllable Framework for Customized Low-Light Image Enhancement
Miao Zhang
Jun Yin
Pengyu Zeng
Yiqing Shen
Shuai Lu
Xueqian Wang
DiffM
63
6
0
11 Mar 2025
Exploring Token-Level Augmentation in Vision Transformer for Semi-Supervised Semantic Segmentation
Dengke Zhang
Quan Tang
Fagui Liu
C. L. Philip Chen
Haiqing Mei
ViT
50
0
0
04 Mar 2025
SafeText: Safe Text-to-image Models via Aligning the Text Encoder
SafeText: Safe Text-to-image Models via Aligning the Text Encoder
Yuepeng Hu
Zhengyuan Jiang
Neil Zhenqiang Gong
42
1
0
28 Feb 2025
Spiking Transformer:Introducing Accurate Addition-Only Spiking Self-Attention for Transformer
Spiking Transformer:Introducing Accurate Addition-Only Spiking Self-Attention for Transformer
Yufei Guo
Xiaode Liu
Y. Chen
Weihang Peng
Yuhan Zhang
Zhe Ma
MQ
43
0
0
28 Feb 2025
Enhancing Vehicle Make and Model Recognition with 3D Attention Modules
Enhancing Vehicle Make and Model Recognition with 3D Attention Modules
Narges Semiromizadeh
Omid Nejati Manzari
S. B. Shokouhi
S. Mirzakuchaki
ViT
83
0
0
24 Feb 2025
ESIQA: Perceptual Quality Assessment of Vision-Pro-based Egocentric Spatial Images
ESIQA: Perceptual Quality Assessment of Vision-Pro-based Egocentric Spatial Images
Zhirui Kuai
Liu Yang
Huiyu Duan
Yuxing Han
Guoyu Tang
P. Callet
73
2
0
24 Feb 2025
QCS: Feature Refining from Quadruplet Cross Similarity for Facial Expression Recognition
QCS: Feature Refining from Quadruplet Cross Similarity for Facial Expression Recognition
C. Wang
Li Chen
Lili Wang
Zhaofan Li
Xuebin Lv
76
1
0
28 Jan 2025
Multiscaled Multi-Head Attention-based Video Transformer Network for Hand Gesture Recognition
Mallika Garg
Debashis Ghosh
P. M. Pradhan
SLR
30
15
0
03 Jan 2025
VMamba: Visual State Space Model
VMamba: Visual State Space Model
Yue Liu
Yunjie Tian
Yuzhong Zhao
Hongtian Yu
Lingxi Xie
Yaowei Wang
Qixiang Ye
Jianbin Jiao
Yunfan Liu
Mamba
106
592
0
31 Dec 2024
ShadowMamba: State-Space Model with Boundary-Region Selective Scan for Shadow Removal
ShadowMamba: State-Space Model with Boundary-Region Selective Scan for Shadow Removal
Xiujin Zhu
Chee-Onn Chow
Joon Huang Chuah
Mamba
40
0
0
05 Nov 2024
AttriPrompter: Auto-Prompting with Attribute Semantics for Zero-shot
  Nuclei Detection via Visual-Language Pre-trained Models
AttriPrompter: Auto-Prompting with Attribute Semantics for Zero-shot Nuclei Detection via Visual-Language Pre-trained Models
Yongjian Wu
Yang Zhou
Jiya Saiyin
Bingzheng Wei
M. Lai
Jianzhong Shou
Yan Xu
VLM
MedIm
25
0
0
22 Oct 2024
A three-dimensional force estimation method for the cable-driven soft
  robot based on monocular images
A three-dimensional force estimation method for the cable-driven soft robot based on monocular images
Xiaohan Zhu
Ran Bu
Zhen Li
Fan Xu
Hesheng Wang
18
0
0
12 Sep 2024
Disparity Estimation Using a Quad-Pixel Sensor
Disparity Estimation Using a Quad-Pixel Sensor
Zhuofeng Wu
Doehyung Lee
Zihua Liu
Kazunori Yoshizaki
Yusuke Monno
Masatoshi Okutomi
MDE
18
1
0
01 Sep 2024
MeshUp: Multi-Target Mesh Deformation via Blended Score Distillation
MeshUp: Multi-Target Mesh Deformation via Blended Score Distillation
Hyunwoo Kim
Itai Lang
Noam Aigerman
Thibault Groueix
Vladimir G. Kim
Rana Hanocka
AI4CE
37
3
0
27 Aug 2024
Mixed-View Panorama Synthesis using Geospatially Guided Diffusion
Mixed-View Panorama Synthesis using Geospatially Guided Diffusion
Zhexiao Xiong
Xin Xing
Scott Workman
Subash Khanal
Nathan Jacobs
DiffM
MDE
52
1
0
12 Jul 2024
iiANET: Inception Inspired Attention Hybrid Network for efficient Long-Range Dependency
iiANET: Inception Inspired Attention Hybrid Network for efficient Long-Range Dependency
Haruna Yunusa
Qin Shiyin
Abdulrahman Hamman Adama Chukkol
Isah Bello
A. Lawan
Isah Bello
39
4
0
10 Jul 2024
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Ali Hatamizadeh
Jan Kautz
Mamba
33
56
0
10 Jul 2024
Isomorphic Pruning for Vision Models
Isomorphic Pruning for Vision Models
Gongfan Fang
Xinyin Ma
Michael Bi Mi
Xinchao Wang
VLM
ViT
34
6
0
05 Jul 2024
Towards Attention-based Contrastive Learning for Audio Spoof Detection
Towards Attention-based Contrastive Learning for Audio Spoof Detection
C. Goel
Surya Koppisetti
Ben Colman
Ali Shahriyari
Gaurav Bharaj
50
5
0
03 Jul 2024
Cross-Modal Attention Alignment Network with Auxiliary Text Description
  for zero-shot sketch-based image retrieval
Cross-Modal Attention Alignment Network with Auxiliary Text Description for zero-shot sketch-based image retrieval
Hanwen Su
G. Song
K. Huang
Jiyan Wang
Ming Yang
41
1
0
01 Jul 2024
The 3D-PC: a benchmark for visual perspective taking in humans and machines
The 3D-PC: a benchmark for visual perspective taking in humans and machines
Drew Linsley
Peisen Zhou
A. Ashok
Akash Nagaraj
Gaurav Gaonkar
Francis E Lewis
Zygmunt Pizlo
Thomas Serre
41
6
0
06 Jun 2024
A Survey of Transformer Enabled Time Series Synthesis
A Survey of Transformer Enabled Time Series Synthesis
Alexander Sommers
Logan Cummins
Sudip Mittal
Shahram Rahimi
Maria Seale
Joseph Jaboure
Thomas Arnold
AI4TS
33
2
0
04 Jun 2024
SurgiTrack: Fine-Grained Multi-Class Multi-Tool Tracking in Surgical
  Videos
SurgiTrack: Fine-Grained Multi-Class Multi-Tool Tracking in Surgical Videos
C. Nwoye
N. Padoy
22
2
0
30 May 2024
Improving global awareness of linkset predictions using Cross-Attentive
  Modulation tokens
Improving global awareness of linkset predictions using Cross-Attentive Modulation tokens
Félix Marcoccia
C. Adjih
P. Mühlethaler
28
0
0
28 May 2024
Vision Transformer with Sparse Scan Prior
Vision Transformer with Sparse Scan Prior
Qihang Fan
Huaibo Huang
Mingrui Chen
Ran He
ViT
36
4
0
22 May 2024
Dynamic Identity-Guided Attention Network for Visible-Infrared Person
  Re-identification
Dynamic Identity-Guided Attention Network for Visible-Infrared Person Re-identification
Peng Gao
Yujian Lee
Hui Zhang
Xubo Liu
Yiyang Hu
Guquan Jing
16
1
0
21 May 2024
Visual Language Model based Cross-modal Semantic Communication Systems
Visual Language Model based Cross-modal Semantic Communication Systems
Feibo Jiang
Chuanguo Tang
Li Dong
Kezhi Wang
Kun Yang
Cunhua Pan
VLM
31
2
0
06 May 2024
Unsupervised Dynamics Prediction with Object-Centric Kinematics
Unsupervised Dynamics Prediction with Object-Centric Kinematics
Yeon-Ji Song
Suhyung Choi
Jaein Kim
Jin-Hwa Kim
Byoung-Tak Zhang
31
0
0
29 Apr 2024
Enhancing Efficiency in Vision Transformer Networks: Design Techniques
  and Insights
Enhancing Efficiency in Vision Transformer Networks: Design Techniques and Insights
Moein Heidari
Reza Azad
Sina Ghorbani Kolahi
René Arimond
Leon Niggemeier
...
Afshin Bozorgpour
Ehsan Khodapanah Aghdam
A. Kazerouni
I. Hacihaliloglu
Dorit Merhof
41
7
0
28 Mar 2024
MIKE: A New Benchmark for Fine-grained Multimodal Entity Knowledge
  Editing
MIKE: A New Benchmark for Fine-grained Multimodal Entity Knowledge Editing
Jiaqi Li
Miaozeng Du
Chuanyi Zhang
Yongrui Chen
Nan Hu
Guilin Qi
Haiyun Jiang
Siyuan Cheng
Bo Tian
18
14
0
18 Feb 2024
Transformer-CNN Fused Architecture for Enhanced Skin Lesion Segmentation
Transformer-CNN Fused Architecture for Enhanced Skin Lesion Segmentation
Siddharth Tiwari
MedIm
ViT
20
0
0
10 Jan 2024
Weakly-Supervised 3D Visual Grounding based on Visual Linguistic Alignment
Weakly-Supervised 3D Visual Grounding based on Visual Linguistic Alignment
Xiaoxu Xu
Yitian Yuan
Qiudan Zhang
Wen-Bin Wu
Zequn Jie
Lin Ma
Xu Wang
56
4
0
15 Dec 2023
Guided Image Restoration via Simultaneous Feature and Image Guided
  Fusion
Guided Image Restoration via Simultaneous Feature and Image Guided Fusion
Xinyi Liu
Qian Zhao
Jie-Kai Liang
Huiyu Zeng
Deyu Meng
Lei Zhang
33
0
0
14 Dec 2023
Prompt-In-Prompt Learning for Universal Image Restoration
Prompt-In-Prompt Learning for Universal Image Restoration
Zilong Li
Yiming Lei
Chenglong Ma
Junping Zhang
Hongming Shan
VLM
35
25
0
08 Dec 2023
MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation
  and Editing
MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation and Editing
Haoyu Zhao
Tianyi Lu
Jiaxi Gu
Xing Zhang
Qingping Zheng
Zuxuan Wu
Hang Xu
Yu-Gang Jiang
VGen
DiffM
27
10
0
29 Nov 2023
Entangled View-Epipolar Information Aggregation for Generalizable Neural
  Radiance Fields
Entangled View-Epipolar Information Aggregation for Generalizable Neural Radiance Fields
Zhiyuan Min
Yawei Luo
Wei Yang
Yuesong Wang
Yi Yang
22
2
0
20 Nov 2023
Improved TokenPose with Sparsity
Improved TokenPose with Sparsity
Anning Li
ViT
27
0
0
16 Nov 2023
Transformer-based Multimodal Change Detection with Multitask Consistency
  Constraints
Transformer-based Multimodal Change Detection with Multitask Consistency Constraints
Biyuan Liu
Huaixin Chen
Kun Li
Michael Ying Yang
25
12
0
13 Oct 2023
EViT: An Eagle Vision Transformer with Bi-Fovea Self-Attention
EViT: An Eagle Vision Transformer with Bi-Fovea Self-Attention
Yulong Shi
Mingwei Sun
Yongshuai Wang
Hui Sun
Zengqiang Chen
29
3
0
10 Oct 2023
A survey on deep learning in medical image registration: new
  technologies, uncertainty, evaluation metrics, and beyond
A survey on deep learning in medical image registration: new technologies, uncertainty, evaluation metrics, and beyond
Junyu Chen
Yihao Liu
Shuwen Wei
Zhangxing Bian
Shalini Subramanian
A. Carass
Jerry L. Prince
Yong Du
OOD
30
36
0
28 Jul 2023
Visual Prompt Flexible-Modal Face Anti-Spoofing
Visual Prompt Flexible-Modal Face Anti-Spoofing
Zitong Yu
Rizhao Cai
Yawen Cui
Ajian Liu
Changsheng Chen
30
6
0
26 Jul 2023
123
Next