Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.14899
Cited By
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification
27 March 2021
Chun-Fu Chen
Quanfu Fan
Rameswar Panda
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification"
50 / 134 papers shown
Title
Nonlinear Motion-Guided and Spatio-Temporal Aware Network for Unsupervised Event-Based Optical Flow
Zuntao Liu
Hao Zhuang
Junjie Jiang
Yuhang Song
Zheng Fang
43
0
0
08 May 2025
Balancing Accuracy, Calibration, and Efficiency in Active Learning with Vision Transformers Under Label Noise
Moseli Motsóehli
Hope Mogale
Kyungim Baek
38
0
0
07 May 2025
Image Recognition with Online Lightweight Vision Transformer: A Survey
Zherui Zhang
Rongtao Xu
Jie Zhou
Changwei Wang
Xingtian Pei
...
Jiguang Zhang
Li Guo
Longxiang Gao
W. Xu
Shibiao Xu
ViT
60
0
0
06 May 2025
Token Coordinated Prompt Attention is Needed for Visual Prompting
Zichen Liu
Xu Zou
Gang Hua
Jiahuan Zhou
26
0
0
05 May 2025
Multi-Scale Graph Learning for Anti-Sparse Downscaling
Yingda Fan
Runlong Yu
Janet R. Barclay
A. Appling
Yiming Sun
Yiqun Xie
Xiaowei Jia
AI4CE
18
0
0
03 May 2025
RadioFormer: A Multiple-Granularity Radio Map Estimation Transformer with 1\textpertenthousand Spatial Sampling
Zheng Fang
Kangjun Liu
Ke Chen
Qingyu Liu
J. Zhang
Lingyang Song
Yaowei Wang
36
0
0
27 Apr 2025
Towards Accurate and Interpretable Neuroblastoma Diagnosis via Contrastive Multi-scale Pathological Image Analysis
Zhu Zhu
Shuo Jiang
Jingyuan Zheng
Yawen Li
Yifei Chen
Manli Zhao
Weizhong Gu
Feiwei Qin
Jinhu Wang
Gang Yu
MedIm
33
0
0
18 Apr 2025
ID-Booth: Identity-consistent Face Generation with Diffusion Models
Darian Tomašević
Fadi Boutros
Chenhao Lin
Naser Damer
Vitomir Štruc
Peter Peer
DiffM
55
1
0
10 Apr 2025
HOTFormerLoc: Hierarchical Octree Transformer for Versatile Lidar Place Recognition Across Ground and Aerial Views
Ethan Griffiths
Maryam Haghighat
Simon Denman
Clinton Fookes
Milad Ramezani
3DPC
57
0
0
11 Mar 2025
TSCnet: A Text-driven Semantic-level Controllable Framework for Customized Low-Light Image Enhancement
Miao Zhang
Jun Yin
Pengyu Zeng
Yiqing Shen
Shuai Lu
Xueqian Wang
DiffM
63
6
0
11 Mar 2025
Exploring Token-Level Augmentation in Vision Transformer for Semi-Supervised Semantic Segmentation
Dengke Zhang
Quan Tang
Fagui Liu
C. L. Philip Chen
Haiqing Mei
ViT
50
0
0
04 Mar 2025
SafeText: Safe Text-to-image Models via Aligning the Text Encoder
Yuepeng Hu
Zhengyuan Jiang
Neil Zhenqiang Gong
42
1
0
28 Feb 2025
Spiking Transformer:Introducing Accurate Addition-Only Spiking Self-Attention for Transformer
Yufei Guo
Xiaode Liu
Y. Chen
Weihang Peng
Yuhan Zhang
Zhe Ma
MQ
43
0
0
28 Feb 2025
Enhancing Vehicle Make and Model Recognition with 3D Attention Modules
Narges Semiromizadeh
Omid Nejati Manzari
S. B. Shokouhi
S. Mirzakuchaki
ViT
83
0
0
24 Feb 2025
ESIQA: Perceptual Quality Assessment of Vision-Pro-based Egocentric Spatial Images
Zhirui Kuai
Liu Yang
Huiyu Duan
Yuxing Han
Guoyu Tang
P. Callet
73
2
0
24 Feb 2025
QCS: Feature Refining from Quadruplet Cross Similarity for Facial Expression Recognition
C. Wang
Li Chen
Lili Wang
Zhaofan Li
Xuebin Lv
76
1
0
28 Jan 2025
Multiscaled Multi-Head Attention-based Video Transformer Network for Hand Gesture Recognition
Mallika Garg
Debashis Ghosh
P. M. Pradhan
SLR
30
15
0
03 Jan 2025
VMamba: Visual State Space Model
Yue Liu
Yunjie Tian
Yuzhong Zhao
Hongtian Yu
Lingxi Xie
Yaowei Wang
Qixiang Ye
Jianbin Jiao
Yunfan Liu
Mamba
106
592
0
31 Dec 2024
ShadowMamba: State-Space Model with Boundary-Region Selective Scan for Shadow Removal
Xiujin Zhu
Chee-Onn Chow
Joon Huang Chuah
Mamba
40
0
0
05 Nov 2024
AttriPrompter: Auto-Prompting with Attribute Semantics for Zero-shot Nuclei Detection via Visual-Language Pre-trained Models
Yongjian Wu
Yang Zhou
Jiya Saiyin
Bingzheng Wei
M. Lai
Jianzhong Shou
Yan Xu
VLM
MedIm
25
0
0
22 Oct 2024
A three-dimensional force estimation method for the cable-driven soft robot based on monocular images
Xiaohan Zhu
Ran Bu
Zhen Li
Fan Xu
Hesheng Wang
18
0
0
12 Sep 2024
Disparity Estimation Using a Quad-Pixel Sensor
Zhuofeng Wu
Doehyung Lee
Zihua Liu
Kazunori Yoshizaki
Yusuke Monno
Masatoshi Okutomi
MDE
18
1
0
01 Sep 2024
MeshUp: Multi-Target Mesh Deformation via Blended Score Distillation
Hyunwoo Kim
Itai Lang
Noam Aigerman
Thibault Groueix
Vladimir G. Kim
Rana Hanocka
AI4CE
37
3
0
27 Aug 2024
Mixed-View Panorama Synthesis using Geospatially Guided Diffusion
Zhexiao Xiong
Xin Xing
Scott Workman
Subash Khanal
Nathan Jacobs
DiffM
MDE
52
1
0
12 Jul 2024
iiANET: Inception Inspired Attention Hybrid Network for efficient Long-Range Dependency
Haruna Yunusa
Qin Shiyin
Abdulrahman Hamman Adama Chukkol
Isah Bello
A. Lawan
Isah Bello
39
4
0
10 Jul 2024
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Ali Hatamizadeh
Jan Kautz
Mamba
33
56
0
10 Jul 2024
Isomorphic Pruning for Vision Models
Gongfan Fang
Xinyin Ma
Michael Bi Mi
Xinchao Wang
VLM
ViT
34
6
0
05 Jul 2024
Towards Attention-based Contrastive Learning for Audio Spoof Detection
C. Goel
Surya Koppisetti
Ben Colman
Ali Shahriyari
Gaurav Bharaj
50
5
0
03 Jul 2024
Cross-Modal Attention Alignment Network with Auxiliary Text Description for zero-shot sketch-based image retrieval
Hanwen Su
G. Song
K. Huang
Jiyan Wang
Ming Yang
41
1
0
01 Jul 2024
The 3D-PC: a benchmark for visual perspective taking in humans and machines
Drew Linsley
Peisen Zhou
A. Ashok
Akash Nagaraj
Gaurav Gaonkar
Francis E Lewis
Zygmunt Pizlo
Thomas Serre
41
6
0
06 Jun 2024
A Survey of Transformer Enabled Time Series Synthesis
Alexander Sommers
Logan Cummins
Sudip Mittal
Shahram Rahimi
Maria Seale
Joseph Jaboure
Thomas Arnold
AI4TS
33
2
0
04 Jun 2024
SurgiTrack: Fine-Grained Multi-Class Multi-Tool Tracking in Surgical Videos
C. Nwoye
N. Padoy
22
2
0
30 May 2024
Improving global awareness of linkset predictions using Cross-Attentive Modulation tokens
Félix Marcoccia
C. Adjih
P. Mühlethaler
28
0
0
28 May 2024
Vision Transformer with Sparse Scan Prior
Qihang Fan
Huaibo Huang
Mingrui Chen
Ran He
ViT
36
4
0
22 May 2024
Dynamic Identity-Guided Attention Network for Visible-Infrared Person Re-identification
Peng Gao
Yujian Lee
Hui Zhang
Xubo Liu
Yiyang Hu
Guquan Jing
16
1
0
21 May 2024
Visual Language Model based Cross-modal Semantic Communication Systems
Feibo Jiang
Chuanguo Tang
Li Dong
Kezhi Wang
Kun Yang
Cunhua Pan
VLM
31
2
0
06 May 2024
Unsupervised Dynamics Prediction with Object-Centric Kinematics
Yeon-Ji Song
Suhyung Choi
Jaein Kim
Jin-Hwa Kim
Byoung-Tak Zhang
31
0
0
29 Apr 2024
Enhancing Efficiency in Vision Transformer Networks: Design Techniques and Insights
Moein Heidari
Reza Azad
Sina Ghorbani Kolahi
René Arimond
Leon Niggemeier
...
Afshin Bozorgpour
Ehsan Khodapanah Aghdam
A. Kazerouni
I. Hacihaliloglu
Dorit Merhof
41
7
0
28 Mar 2024
MIKE: A New Benchmark for Fine-grained Multimodal Entity Knowledge Editing
Jiaqi Li
Miaozeng Du
Chuanyi Zhang
Yongrui Chen
Nan Hu
Guilin Qi
Haiyun Jiang
Siyuan Cheng
Bo Tian
18
14
0
18 Feb 2024
Transformer-CNN Fused Architecture for Enhanced Skin Lesion Segmentation
Siddharth Tiwari
MedIm
ViT
20
0
0
10 Jan 2024
Weakly-Supervised 3D Visual Grounding based on Visual Linguistic Alignment
Xiaoxu Xu
Yitian Yuan
Qiudan Zhang
Wen-Bin Wu
Zequn Jie
Lin Ma
Xu Wang
56
4
0
15 Dec 2023
Guided Image Restoration via Simultaneous Feature and Image Guided Fusion
Xinyi Liu
Qian Zhao
Jie-Kai Liang
Huiyu Zeng
Deyu Meng
Lei Zhang
33
0
0
14 Dec 2023
Prompt-In-Prompt Learning for Universal Image Restoration
Zilong Li
Yiming Lei
Chenglong Ma
Junping Zhang
Hongming Shan
VLM
35
25
0
08 Dec 2023
MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation and Editing
Haoyu Zhao
Tianyi Lu
Jiaxi Gu
Xing Zhang
Qingping Zheng
Zuxuan Wu
Hang Xu
Yu-Gang Jiang
VGen
DiffM
27
10
0
29 Nov 2023
Entangled View-Epipolar Information Aggregation for Generalizable Neural Radiance Fields
Zhiyuan Min
Yawei Luo
Wei Yang
Yuesong Wang
Yi Yang
22
2
0
20 Nov 2023
Improved TokenPose with Sparsity
Anning Li
ViT
27
0
0
16 Nov 2023
Transformer-based Multimodal Change Detection with Multitask Consistency Constraints
Biyuan Liu
Huaixin Chen
Kun Li
Michael Ying Yang
25
12
0
13 Oct 2023
EViT: An Eagle Vision Transformer with Bi-Fovea Self-Attention
Yulong Shi
Mingwei Sun
Yongshuai Wang
Hui Sun
Zengqiang Chen
29
3
0
10 Oct 2023
A survey on deep learning in medical image registration: new technologies, uncertainty, evaluation metrics, and beyond
Junyu Chen
Yihao Liu
Shuwen Wei
Zhangxing Bian
Shalini Subramanian
A. Carass
Jerry L. Prince
Yong Du
OOD
30
36
0
28 Jul 2023
Visual Prompt Flexible-Modal Face Anti-Spoofing
Zitong Yu
Rizhao Cai
Yawen Cui
Ajian Liu
Changsheng Chen
30
6
0
26 Jul 2023
1
2
3
Next