v1v2v3 (latest)

UniFormer: Unifying Convolution and Self-attention for Visual Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022

24 January 2022

Yu Qiao

ArXiv (abs)PDF HTML Github (865★)

Papers citing "UniFormer: Unifying Convolution and Self-attention for Visual Recognition"

50 / 178 papers shown

Towards an Effective Action-Region Tracking Framework for Fine-grained Video Action Recognition

265

26 Nov 2025

Learning Skill-Attributes for Transferable Assessment in Video

Kumar Ashutosh

Kristen Grauman

229

17 Nov 2025

AGGRNet: Selective Feature Extraction and Aggregation for Enhanced Medical Image Classification

140

15 Nov 2025

Enhancing Pre-trained Representation Classifiability can Boost its InterpretabilityInternational Conference on Learning Representations (ICLR), 2025

468

28 Oct 2025

Attentive Convolution: Unifying the Expressivity of Self-Attention with Convolutional Efficiency

191

23 Oct 2025

Image-to-Video Transfer Learning based on Image-Language Foundation Models: A Comprehensive Survey

255

12 Oct 2025

Every Subtlety Counts: Fine-grained Person Independence Micro-Action Recognition via Distributionally Robust Optimization

264

25 Sep 2025

CECT-Mamba: a Hierarchical Contrast-enhanced-aware Model for Pancreatic Tumor Subtyping from Multi-phase CECT

247

16 Sep 2025

Sparse Transformer for Ultra-sparse Sampled Video Compressive Sensing

221

10 Sep 2025

EmoPerso: Enhancing Personality Detection with Self-Supervised Emotion-Aware Modelling

272

02 Sep 2025

WaveHiT-SR: Hierarchical Wavelet Network for Efficient Image Super-Resolution

152

27 Aug 2025

Edge Detection for Organ Boundaries via Top Down Refinement and SubPixel Upsampling

356

09 Aug 2025

VesselRW: Weakly Supervised Subcutaneous Vessel Segmentation via Learned Random Walk Propagation

Ayaan Nooruddin Siddiqui

Mahnoor Zaidi

Ayesha Nazneen Shahbaz

Priyadarshini Chatterjee

Krishnan Menon Iyer

318

09 Aug 2025

DualResolution Residual Architecture with Artifact Suppression for Melanocytic Lesion Segmentation

Vikram Singh

Kabir Malhotra

Rohan Desai

Ananya Shankaracharya

Priyadarshini Chatterjee

Krishnan Menon Iyer

MedIm

400

09 Aug 2025

CoCAViT: Compact Vision Transformer with Robust Global Coordination

184

07 Aug 2025

Deeply Dual Supervised learning for melanoma recognition

Rujosh Polma

Krishnan Menon Iyer

281

04 Aug 2025

Recognizing Actions from Robotic View for Natural Human-Robot Interaction

234

30 Jul 2025

A2Mamba: Attention-augmented State Space Models for Visual Recognition

268

22 Jul 2025

evMLP: An Efficient Event-Driven MLP Architecture for Vision

Zhentan Zheng

VLM

288

02 Jul 2025

Combining Self-attention and Dilation Convolutional for Semantic Segmentation of Coal Maceral GroupsIEEE Internet of Things Journal (IEEE IoT J.), 2025

188

15 Jun 2025

Burst Image Super-Resolution via Multi-Cross Attention Encoding and Multi-Scan State-Space DecodingImage and Vision Computing (IVC), 2025

282

26 May 2025

Structured Initialization for Vision Transformers

Jianqiao Zheng

Xueqian Li

Hemanth Saratchandran

Simon Lucey

ViT

271

26 May 2025

MSLAU-Net: A Hybird CNN-Transformer Network for Medical Image Segmentation

342

24 May 2025

Dual Branch VideoMamba with Gated Class Token Fusion for Violence Detection

Damith Chamalke Senadeera

297

23 May 2025

MFogHub: Bridging Multi-Regional and Multi-Satellite Data for Global Marine Fog Detection and ForecastingComputer Vision and Pattern Recognition (CVPR), 2025

304

15 May 2025

Learning Streaming Video Representation via Multitask Training

554

28 Apr 2025

VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning

459

10 Apr 2025

Audio-visual Event Localization on Portrait Mode Short Videos

345

09 Apr 2025

HGFormer: Topology-Aware Vision Transformer with HyperGraph LearningIEEE transactions on multimedia (TMM), 2025

715

03 Apr 2025

Spectral-Adaptive Modulation Networks for Visual Perception

518

31 Mar 2025

VTD-CLIP: Video-to-Text Discretization via Prompting CLIP

418

24 Mar 2025

Stitch-a-Demo: Video Demonstrations from Multistep Descriptions

364

18 Mar 2025

Underlying Semantic Diffusion for Effective and Efficient In-Context Learning

361

06 Mar 2025

OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic KernelsComputer Vision and Pattern Recognition (CVPR), 2025

Meng Lou

Yizhou Yu

762

27 Feb 2025

InternVQA: Advancing Compressed Video QualityAssessment with Distilling Large Foundation ModelInternational Symposium on Circuits and Systems (ISCAS), 2025

405

26 Feb 2025

RT-DEMT: A hybrid real-time acupoint detection model combining mamba and transformer

582

16 Feb 2025

CSPCL: Category Semantic Prior Contrastive Learning for Deformable DETR-Based Prohibited Item Detectors

418

28 Jan 2025

Slicing Vision Transformer for Flexible InferenceNeural Information Processing Systems (NeurIPS), 2024

381

06 Dec 2024

AM Flow: Adapters for Temporal Processing in Action Recognition

312

04 Nov 2024

UTSRMorph: A Unified Transformer and Superresolution Network for Unsupervised Medical Image RegistrationIEEE Transactions on Medical Imaging (IEEE TMI), 2024

203

27 Oct 2024

Fuse Before Transfer: Knowledge Fusion for Heterogeneous Distillation

468

16 Oct 2024

MoH: Multi-Head Attention as Mixture-of-Head AttentionInternational Conference on Machine Learning (ICML), 2024

573

15 Oct 2024

Transforming Game Play: A Comparative Study of DCQN and DTQN Architectures in Reinforcement Learning

William A. Stigall

387

14 Oct 2024

Continual Learning Improves Zero-Shot Action RecognitionAsian Conference on Computer Vision (ACCV), 2024

527

14 Oct 2024

Multi-modal Vision Pre-training for Medical Image AnalysisComputer Vision and Pattern Recognition (CVPR), 2024

415

14 Oct 2024

Generating Intermediate Representations for Compositional Text-To-Image Generation

Ran Galun

Sagie Benaim

247

13 Oct 2024

Polyp-SES: Automatic Polyp Segmentation with Self-Enriched Semantic ModelAsian Conference on Computer Vision (ACCV), 2024

293

02 Oct 2024

Progressive Representation Learning for Real-Time UAV TrackingIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2024

Guangze Zheng

328

25 Sep 2024

Resolving Inconsistent Semantics in Multi-Dataset Image Segmentation

Qilong Zhangli

Di Liu

Abhishek Aich

Dimitris Metaxas

S. Schulter

229

15 Sep 2024

SparX: A Sparse Cross-Layer Connection Mechanism for Hierarchical Vision Mamba and Transformer NetworksAAAI Conference on Artificial Intelligence (AAAI), 2024

Meng Lou

Yunxiang Fu

Yizhou Yu

Mamba

320

15 Sep 2024