SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications

27 March 2023

Abdelrahman M. Shaker

Salman Khan

Papers citing "SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications"

45 / 45 papers shown

Title
Image Recognition with Online Lightweight Vision Transformer: A Survey Zherui Zhang Rongtao Xu Jie Zhou Changwei Wang Xingtian Pei ... Jiguang Zhang Li Guo Longxiang Gao W. Xu Shibiao Xu ViT 60 0 0 06 May 2025
An Adaptive Data-Resilient Multi-Modal Framework for Hierarchical Multi-Label Book Genre Identification Utsav Nareti S. Chattopadhyay Prolay Mallick Suraj Kumar Ayush Vikas Daga Chandranath Adak Adarsh Wase Arjab Roy 18 0 0 05 May 2025
The Fourth Monocular Depth Estimation Challenge Anton Obukhov Matteo Poggi Fabio Tosi Ripudaman Singh Arora Jaime Spencer ... Tuan-Anh Yang Minh-Quang Nguyen T. Tran Albert Luginov Muhammad Shahzad MDE 55 0 0 24 Apr 2025
LSNet: See Large, Focus Small Ao Wang Hui Chen Zijia Lin J. Han Guiguang Ding 37 0 0 29 Mar 2025
Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model Abdelrahman M. Shaker Muhammad Maaz Chenhui Gou Hamid Rezatofighi Salman Khan F. Khan 70 0 0 27 Mar 2025
Fraesormer: Learning Adaptive Sparse Transformer for Efficient Food Recognition Shun Zou Yi Zou Mingya Zhang Shipeng Luo Zhihao Chen Guangwei Gao ViT 43 0 0 15 Mar 2025
Partial Convolution Meets Visual Attention Haiduo Huang Fuwei Yang D. Li Ji Liu Lu Tian Jinzhang Peng Pengju Ren E. Barsoum 3DH 112 0 0 05 Mar 2025
Neural Attention: A Novel Mechanism for Enhanced Expressive Power in Transformer Models Andrew DiGiugno Ausif Mahmood 33 0 0 24 Feb 2025
iFormer: Integrating ConvNet and Transformer for Mobile Application Chuanyang Zheng ViT 67 0 0 26 Jan 2025
Rethinking Encoder-Decoder Flow Through Shared Structures Frederik Laboyrie M. K. Yucel Albert Saà-Garriga AI4CE 40 0 0 24 Jan 2025
SurgRIPE challenge: Benchmark of Surgical Robot Instrument Pose Estimation Haozheng Xu Alistair Weld Chi Xu Alfie Roddan João Cartucho ... Lucy Fothergill Dominic Jones Pietro Valdastri Duygu Sarikaya Stamatia Giannarou 27 1 0 06 Jan 2025
A Separable Self-attention Inspired by the State Space Model for Computer Vision Juntao Zhang Shaogeng Liu Kun Bian You Zhou Pei Zhang Jianning Liu Jun Zhou Bingyan Liu Mamba 45 0 0 03 Jan 2025
RecConv: Efficient Recursive Convolutions for Multi-Frequency Representations Mingshu Zhao Yi Luo Yong Ouyang 31 0 0 27 Dec 2024
TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba Xiaowen Ma Zhenliang Ni Xinghao Chen Mamba 73 2 0 26 Nov 2024
EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality Sanghyeok Lee Joonmyung Choi Hyunwoo J. Kim 110 3 0 22 Nov 2024
Compositional Segmentation of Cardiac Images Leveraging Metadata Abbas Khan Muhammad Asad Martin Benning C. Roney Gregory Slabaugh 26 0 0 30 Oct 2024
Lodge++: High-quality and Long Dance Generation with Vivid Choreography Patterns Ronghui Li Hongwen Zhang Yachao Zhang Yuxiang Zhang Youliang Zhang Jie Guo Yan Zhang Xiu Li Yebin Liu 30 6 0 27 Oct 2024
PETAH: Parameter Efficient Task Adaptation for Hybrid Transformers in a resource-limited Context Maximilian Augustin Syed Shakib Sarwar Mostafa Elhoushi Sai Qian Zhang Yuecheng Li B. D. Salvo 20 0 0 23 Oct 2024
HRVMamba: High-Resolution Visual State Space Model for Dense Prediction Hao Zhang Yongqiang Ma Wenqi Shao Ping Luo Nanning Zheng Kaipeng Zhang Mamba 28 1 0 04 Oct 2024
NimbleD: Enhancing Self-supervised Monocular Depth Estimation with Pseudo-labels and Large-scale Video Pre-training Albert Luginov Muhammad Shahzad SSL MDE 29 1 0 26 Aug 2024
CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications Tianfang Zhang Lei Li Yang Zhou Wentao Liu Chen Qian Xiangyang Ji ViT 28 9 0 07 Aug 2024
GroupMamba: Efficient Group-Based Visual State Space Model Abdelrahman M. Shaker Syed Talal Wasim Salman Khan Juergen Gall Fahad Shahbaz Khan Mamba 51 0 0 18 Jul 2024
PADRe: A Unifying Polynomial Attention Drop-in Replacement for Efficient Vision Transformer Pierre-David Létourneau Manish Kumar Singh Hsin-Pai Cheng Shizhong Han Yunxiao Shi Dalton Jones M. H. Langston Hong Cai Fatih Porikli 32 0 0 16 Jul 2024
RepNeXt: A Fast Multi-Scale CNN using Structural Reparameterization Mingshu Zhao Yi Luo Yong Ouyang 30 2 0 23 Jun 2024
Decoupling Forgery Semantics for Generalizable Deepfake Detection Wei Ye Xinan He Feng Ding 30 8 0 14 Jun 2024
ToSA: Token Selective Attention for Efficient Vision Transformers Manish Kumar Singh R. Yasarla Hong Cai Mingu Lee Fatih Porikli 44 0 0 13 Jun 2024
Convolution and Attention-Free Mamba-based Cardiac Image Segmentation Abbas Khan Muhammad Asad Martin Benning C. Roney Gregory Slabaugh Mamba 22 2 0 09 Jun 2024
Automatic Channel Pruning for Multi-Head Attention Eunho Lee Youngbae Hwang ViT 32 1 0 31 May 2024
CUE-Net: Violence Detection Video Analytics with Spatial Cropping, Enhanced UniformerV2 and Modified Efficient Additive Attention Damith Chamalke Senadeera Xiaoyun Yang Dimitrios Kollias Gregory G. Slabaugh 27 0 0 27 Apr 2024
HSViT: Horizontally Scalable Vision Transformer Chenhao Xu Chang-Tsun Li Chee Peng Lim Douglas Creighton ViT 21 1 0 08 Apr 2024
Enhancing Efficiency in Vision Transformer Networks: Design Techniques and Insights Moein Heidari Reza Azad Sina Ghorbani Kolahi René Arimond Leon Niggemeier ... Afshin Bozorgpour Ehsan Khodapanah Aghdam A. Kazerouni I. Hacihaliloglu Dorit Merhof 41 7 0 28 Mar 2024
PEM: Prototype-based Efficient MaskFormer for Image Segmentation Niccolò Cavagnero Gabriele Rosi Claudia Cuttano Francesca Pistilli Marco Ciccone Giuseppe Averta Fabio Cermelli 38 21 0 29 Feb 2024
Crop and Couple: cardiac image segmentation using interlinked specialist networks Abbas Khan Muhammad Asad Martin Benning C. Roney Gregory Slabaugh 27 3 0 14 Feb 2024
SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design Seokju Yun Youngmin Ro ViT 34 29 0 29 Jan 2024
Single-sample versus case-control sampling scheme for Positive Unlabeled data: the story of two scenarios Jan Mielniczuk Adam Wawrzeñczyk 10 2 0 04 Dec 2023
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training Pavan Kumar Anasosalu Vasu Hadi Pouransari Fartash Faghri Raviteja Vemulapalli Oncel Tuzel CLIP VLM 11 43 0 28 Nov 2023
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers Tobias Christian Nauen Sebastián M. Palacio Federico Raue Andreas Dengel 35 3 0 18 Aug 2023
RepViT: Revisiting Mobile CNN From ViT Perspective Ao Wang Hui Chen Zijia Lin Hengjun Pu Guiguang Ding 27 169 0 18 Jul 2023
Spike-driven Transformer Man Yao Jiakui Hu Zhaokun Zhou Liuliang Yuan Yonghong Tian Boxing Xu Guoqi Li 21 109 0 04 Jul 2023
Efficient Large-Scale Visual Representation Learning And Evaluation Eden Dolev A. Awad Denisa Roberts Zahra Ebrahimzadeh Marcin Mejran Vaibhav Malpani Mahir Yavuz 30 0 0 22 May 2023
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer Sachin Mehta Mohammad Rastegari ViT 189 1,148 0 05 Oct 2021
Mobile-Former: Bridging MobileNet and Transformer Yinpeng Chen Xiyang Dai Dongdong Chen Mengchen Liu Xiaoyi Dong Lu Yuan Zicheng Liu ViT 172 462 0 12 Aug 2021
MLP-Mixer: An all-MLP Architecture for Vision Ilya O. Tolstikhin N. Houlsby Alexander Kolesnikov Lucas Beyer Xiaohua Zhai ... Andreas Steiner Daniel Keysers Jakob Uszkoreit Mario Lucic Alexey Dosovitskiy 239 2,554 0 04 May 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions Wenhai Wang Enze Xie Xiang Li Deng-Ping Fan Kaitao Song Ding Liang Tong Lu Ping Luo Ling Shao ViT 263 3,538 0 24 Feb 2021
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications Andrew G. Howard Menglong Zhu Bo Chen Dmitry Kalenichenko Weijun Wang Tobias Weyand M. Andreetto Hartwig Adam 3DH 948 20,214 0 17 Apr 2017