Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2201.09450
Cited By
v1
v2
v3 (latest)
UniFormer: Unifying Convolution and Self-attention for Visual Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
24 January 2022
Kunchang Li
Yali Wang
Junhao Zhang
Shiyang Feng
Guanglu Song
Yu Liu
Jiaming Song
Yu Qiao
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Github (865★)
Papers citing
"UniFormer: Unifying Convolution and Self-attention for Visual Recognition"
50 / 178 papers shown
Towards an Effective Action-Region Tracking Framework for Fine-grained Video Action Recognition
Baoli Sun
Y. X. R. Wang
Xinzhu Ma
Zhihui Wang
Kun Lu
Zhiyong Wang
265
0
0
26 Nov 2025
Learning Skill-Attributes for Transferable Assessment in Video
Kumar Ashutosh
Kristen Grauman
229
2
0
17 Nov 2025
AGGRNet: Selective Feature Extraction and Aggregation for Enhanced Medical Image Classification
Ansh Makwe
Akansh Agrawal
Prateek Jain
Akshan Agrawal
Priyanka Bagade
140
0
0
15 Nov 2025
Enhancing Pre-trained Representation Classifiability can Boost its Interpretability
International Conference on Learning Representations (ICLR), 2025
Shufan Shen
Zhaobo Qi
Junshu Sun
Qingming Huang
Qi Tian
Shuhui Wang
FAtt
468
5
0
28 Oct 2025
Attentive Convolution: Unifying the Expressivity of Self-Attention with Convolutional Efficiency
Hao Yu
H. G. Chen
Yan Jiang
Wei Peng
Zhaodong Sun
Samuel Kaski
Guoying Zhao
191
0
0
23 Oct 2025
Image-to-Video Transfer Learning based on Image-Language Foundation Models: A Comprehensive Survey
Jinxuan Li
Chaolei Tan
Haoxuan Chen
Jianxin Ma
Jian-Fang Hu
Wei-Shi Zheng
Jianhuang Lai
VLM
255
1
0
12 Oct 2025
Every Subtlety Counts: Fine-grained Person Independence Micro-Action Recognition via Distributionally Robust Optimization
Feng-Qi Cui
Jinyang Huang
Anyang Tong
Ziyu Jia
Jie Zhang
Zhi Liu
Dan Guo
Jianwei Lu
Meng Wang
264
1
0
25 Sep 2025
CECT-Mamba: a Hierarchical Contrast-enhanced-aware Model for Pancreatic Tumor Subtyping from Multi-phase CECT
Zhifang Gong
Shuo Gao
Ben Zhao
Yingjing Xu
Yijun Yang
Shenghong Ju
Guangquan Zhou
Mamba
247
0
0
16 Sep 2025
Sparse Transformer for Ultra-sparse Sampled Video Compressive Sensing
Miao Cao
Siming Zheng
Lishun Wang
Ziyang Chen
D. Brady
Xin Yuan
221
0
0
10 Sep 2025
EmoPerso: Enhancing Personality Detection with Self-Supervised Emotion-Aware Modelling
Lingzhi Shen
Xiaohao Cai
Yunfei Long
Imran Razzak
Guanming Chen
Shoaib Jameel
272
2
0
02 Sep 2025
WaveHiT-SR: Hierarchical Wavelet Network for Efficient Image Super-Resolution
Fayaz Ali
Muhammad Zawish
Steven Davy
Radu Timofte
152
1
0
27 Aug 2025
Edge Detection for Organ Boundaries via Top Down Refinement and SubPixel Upsampling
Aarav Mehta
Priya Deshmukh
Vikram Singh
Siddharth Malhotra
Krishnan Menon Iyer
Tanvi Iyer
MedIm
356
0
0
09 Aug 2025
VesselRW: Weakly Supervised Subcutaneous Vessel Segmentation via Learned Random Walk Propagation
Ayaan Nooruddin Siddiqui
Mahnoor Zaidi
Ayesha Nazneen Shahbaz
Priyadarshini Chatterjee
Krishnan Menon Iyer
318
0
0
09 Aug 2025
DualResolution Residual Architecture with Artifact Suppression for Melanocytic Lesion Segmentation
Vikram Singh
Kabir Malhotra
Rohan Desai
Ananya Shankaracharya
Priyadarshini Chatterjee
Krishnan Menon Iyer
MedIm
400
0
0
09 Aug 2025
CoCAViT: Compact Vision Transformer with Robust Global Coordination
Xuyang Wang
Lingjuan Miao
Zhiqiang Zhou
ViT
VLM
184
1
0
07 Aug 2025
Deeply Dual Supervised learning for melanoma recognition
Rujosh Polma
Krishnan Menon Iyer
281
0
0
04 Aug 2025
Recognizing Actions from Robotic View for Natural Human-Robot Interaction
Ziyi Wang
Peiming Li
Hong Liu
Zhichao Deng
Can Wang
Jun Liu
Junsong Yuan
Mengyuan Liu
234
4
0
30 Jul 2025
A2Mamba: Attention-augmented State Space Models for Visual Recognition
Meng Lou
Yunxiang Fu
Yizhou Yu
Mamba
268
0
0
22 Jul 2025
evMLP: An Efficient Event-Driven MLP Architecture for Vision
Zhentan Zheng
VLM
288
0
0
02 Jul 2025
Combining Self-attention and Dilation Convolutional for Semantic Segmentation of Coal Maceral Groups
IEEE Internet of Things Journal (IEEE IoT J.), 2025
Zhenghao Xi
Zhengnan Lv
Yang Zheng
Xiang Liu
Zhuang Yu
Junran Chen
Jing Hu
Yaqi Liu
DiffM
188
0
0
15 Jun 2025
Burst Image Super-Resolution via Multi-Cross Attention Encoding and Multi-Scan State-Space Decoding
Image and Vision Computing (IVC), 2025
Tengda Huang
Yu Zhang
Tianren Li
Yufu Qu
Fulin Liu
Zhenzhong Wei
SupR
282
0
0
26 May 2025
Structured Initialization for Vision Transformers
Jianqiao Zheng
Xueqian Li
Hemanth Saratchandran
Simon Lucey
ViT
271
2
0
26 May 2025
MSLAU-Net: A Hybird CNN-Transformer Network for Medical Image Segmentation
Libin Lan
Yanxin Li
Xiaojuan Liu
Juan Zhou
Jianxun Zhang
Nannan Huang
Yudong Zhang
ViT
MedIm
342
3
0
24 May 2025
Dual Branch VideoMamba with Gated Class Token Fusion for Violence Detection
Damith Chamalke Senadeera
Xiaoyun Yang
Shibo Li
Muhammad Awais
Dimitrios Kollias
Gregory G. Slabaugh
Mamba
297
3
0
23 May 2025
MFogHub: Bridging Multi-Regional and Multi-Satellite Data for Global Marine Fog Detection and Forecasting
Computer Vision and Pattern Recognition (CVPR), 2025
Mengqiu Xu
Kaixin Chen
Heng Guo
Yixiang Huang
Ming Wu
Zhenwei Shi
Chuang Zhang
Jun Guo
304
3
0
15 May 2025
Learning Streaming Video Representation via Multitask Training
Yibin Yan
Jilan Xu
Shangzhe Di
Yikun Liu
Yudi Shi
Qirui Chen
Zeqian Li
Yifei Huang
Weidi Xie
CLL
554
5
0
28 Apr 2025
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning
Zhong-Yu Li
Ruoyi Du
Juncheng Yan
Le Zhuo
Zhen Li
Peng Gao
Zhanyu Ma
Ming-Ming Cheng
Ming-Ming Cheng
VLM
459
27
0
10 Apr 2025
Audio-visual Event Localization on Portrait Mode Short Videos
Wuyang Liu
Yi Chai
Yongpeng Yan
Yanzhen Ren
345
2
0
09 Apr 2025
HGFormer: Topology-Aware Vision Transformer with HyperGraph Learning
IEEE transactions on multimedia (TMM), 2025
Hao Wang
Shuo Zhang
Biao Leng
ViT
715
6
0
03 Apr 2025
Spectral-Adaptive Modulation Networks for Visual Perception
Guhnoo Yun
J. Yoo
Kijung Kim
Jeongho Lee
Paul Hongsuck Seo
Dong Hwan Kim
518
0
1
31 Mar 2025
VTD-CLIP: Video-to-Text Discretization via Prompting CLIP
Wencheng Zhu
Yuexin Wang
Hongxuan Li
Q. Hu
Q. Hu
CLIP
418
1
0
24 Mar 2025
Stitch-a-Demo: Video Demonstrations from Multistep Descriptions
Chi Hsuan Wu
Kumar Ashutosh
Kristen Grauman
VGen
3DV
364
1
0
18 Mar 2025
Underlying Semantic Diffusion for Effective and Efficient In-Context Learning
Zhong Ji
Weilong Cao
Yan Zhang
Yanwei Pang
Jungong Han
Xuelong Li
DiffM
VLM
361
1
0
06 Mar 2025
OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels
Computer Vision and Pattern Recognition (CVPR), 2025
Meng Lou
Yizhou Yu
762
59
0
27 Feb 2025
InternVQA: Advancing Compressed Video QualityAssessment with Distilling Large Foundation Model
International Symposium on Circuits and Systems (ISCAS), 2025
Fengbin Guan
Zihao Yu
Yiting Lu
Xin Li
Zhibo Chen
405
4
0
26 Feb 2025
RT-DEMT: A hybrid real-time acupoint detection model combining mamba and transformer
Shilong Yang
Qi Zang
Chulong Zhang
Lingfeng Huang
Yaoqin Xie
Mamba
582
6
0
16 Feb 2025
CSPCL: Category Semantic Prior Contrastive Learning for Deformable DETR-Based Prohibited Item Detectors
Mingyuan Li
Tong Jia
Hui Lu
Bowen Ma
Hao Wang
Shiyi Guo
Da Cai
Dongyue Chen
418
2
0
28 Jan 2025
Slicing Vision Transformer for Flexible Inference
Neural Information Processing Systems (NeurIPS), 2024
Yitian Zhang
Huseyin Coskun
Xu Ma
Huan Wang
Ke Ma
Xi
Chen
Derek Hao Hu
Y. Fu
ViT
381
2
0
06 Dec 2024
AM Flow: Adapters for Temporal Processing in Action Recognition
Tanay Agrawal
Abid Ali
A. Dantcheva
François Brémond
312
0
0
04 Nov 2024
UTSRMorph: A Unified Transformer and Superresolution Network for Unsupervised Medical Image Registration
IEEE Transactions on Medical Imaging (IEEE TMI), 2024
Runshi Zhang
Hao Mo
Junchen Wang
Bimeng Jie
Yang He
Nenghao Jin
Liang Zhu
ViT
MedIm
203
18
0
27 Oct 2024
Fuse Before Transfer: Knowledge Fusion for Heterogeneous Distillation
Guopeng Li
Qiang Wang
K. Yan
Shouhong Ding
Yuan Gao
Gui-Song Xia
468
0
0
16 Oct 2024
MoH: Multi-Head Attention as Mixture-of-Head Attention
International Conference on Machine Learning (ICML), 2024
Peng Jin
Bo Zhu
Li Yuan
Shuicheng Yan
MoE
573
47
0
15 Oct 2024
Transforming Game Play: A Comparative Study of DCQN and DTQN Architectures in Reinforcement Learning
William A. Stigall
387
1
0
14 Oct 2024
Continual Learning Improves Zero-Shot Action Recognition
Asian Conference on Computer Vision (ACCV), 2024
Shreyank N. Gowda
Davide Moltisanti
Laura Sevilla-Lara
BDL
VLM
CLL
527
4
0
14 Oct 2024
Multi-modal Vision Pre-training for Medical Image Analysis
Computer Vision and Pattern Recognition (CVPR), 2024
Shaohao Rui
Lingzhi Chen
Zhenyu Tang
Lilong Wang
M. Liu
Shanghang Zhang
Xiaosong Wang
415
0
0
14 Oct 2024
Generating Intermediate Representations for Compositional Text-To-Image Generation
Ran Galun
Sagie Benaim
247
1
0
13 Oct 2024
Polyp-SES: Automatic Polyp Segmentation with Self-Enriched Semantic Model
Asian Conference on Computer Vision (ACCV), 2024
Quang Vinh Nguyen
Thanh Hoang Son Vo
Sae-Ryung Kang
Soo-Hyung Kim
293
2
0
02 Oct 2024
Progressive Representation Learning for Real-Time UAV Tracking
IEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2024
Changhong Fu
Xiang Lei
Haobo Zuo
Weitong Chen
Guangze Zheng
Jia Pan
AI4TS
328
19
0
25 Sep 2024
Resolving Inconsistent Semantics in Multi-Dataset Image Segmentation
Qilong Zhangli
Di Liu
Abhishek Aich
Dimitris Metaxas
S. Schulter
229
2
0
15 Sep 2024
SparX: A Sparse Cross-Layer Connection Mechanism for Hierarchical Vision Mamba and Transformer Networks
AAAI Conference on Artificial Intelligence (AAAI), 2024
Meng Lou
Yunxiang Fu
Yizhou Yu
Mamba
320
30
0
15 Sep 2024
1
2
3
4
Next
Page 1 of 4