Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2112.13890
Cited By
SPViT: Enabling Faster Vision Transformers via Soft Token Pruning
27 December 2021
Zhenglun Kong
Peiyan Dong
Xiaolong Ma
Xin Meng
Mengshu Sun
Wei Niu
Xuan Shen
Geng Yuan
Bin Ren
Minghai Qin
H. Tang
Yanzhi Wang
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SPViT: Enabling Faster Vision Transformers via Soft Token Pruning"
30 / 30 papers shown
Title
Image Recognition with Online Lightweight Vision Transformer: A Survey
Zherui Zhang
Rongtao Xu
Jie Zhou
Changwei Wang
Xingtian Pei
...
Jiguang Zhang
Li Guo
Longxiang Gao
W. Xu
Shibiao Xu
ViT
54
0
0
06 May 2025
Back to Fundamentals: Low-Level Visual Features Guided Progressive Token Pruning
Yuanbing Ouyang
Yizhuo Liang
Qingpeng Li
Xinfei Guo
Yiming Luo
Di Wu
Hao Wang
Yushan Pan
ViT
VLM
64
0
0
25 Apr 2025
Keyframe-oriented Vision Token Pruning: Enhancing Efficiency of Large Vision Language Models on Long-Form Video Processing
Yudong Liu
Jingwei Sun
Yueqian Lin
Jingyang Zhang
Ming Yin
Qinsi Wang
J. Zhang
H. Li
Y. Chen
VLM
68
2
0
13 Mar 2025
When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning
Junwei Luo
Yingying Zhang
X. J. Yang
Kang Wu
Qi Zhu
Lei Liang
Jingdong Chen
Yansheng Li
62
0
0
10 Mar 2025
MaskGaussian: Adaptive 3D Gaussian Representation from Probabilistic Masks
Yifei Liu
Zhihang Zhong
Yifan Zhan
Sheng Xu
Xiao Sun
3DGS
51
3
0
29 Dec 2024
A-VL: Adaptive Attention for Large Vision-Language Models
Junyang Zhang
Mu Yuan
Ruiguang Zhong
Puhan Luo
Huiyou Zhan
Ningkang Zhang
Chengchen Hu
Xiangyang Li
VLM
36
1
0
23 Sep 2024
Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models
Weihao Ye
Qiong Wu
Wenhao Lin
Yiyi Zhou
VLM
27
10
0
16 Sep 2024
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
Leqi Shen
Tianxiang Hao
Tao He
Sicheng Zhao
Pengzhang Liu
Yongjun Bao
Guiguang Ding
Guiguang Ding
70
7
0
02 Sep 2024
MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity
Kanghyun Choi
Hyeyoon Lee
Dain Kwon
Sunjong Park
Kyuyeun Kim
Noseong Park
Jinho Lee
Jinho Lee
MQ
37
1
0
29 Jul 2024
LPViT: Low-Power Semi-structured Pruning for Vision Transformers
Kaixin Xu
Zhe Wang
Chunyun Chen
Xue Geng
Jie Lin
Xulei Yang
Min-man Wu
Min Wu
Xiaoli Li
Weisi Lin
ViT
VLM
43
5
0
02 Jul 2024
Accelerating Transformers with Spectrum-Preserving Token Merging
Hoai-Chau Tran
D. M. Nguyen
Duy M. Nguyen
Trung Thanh Nguyen
Ngan Le
Pengtao Xie
Daniel Sonntag
James Y. Zou
Binh T. Nguyen
Mathias Niepert
32
8
0
25 May 2024
Adaptive Depth Networks with Skippable Sub-Paths
Woochul Kang
28
1
0
27 Dec 2023
How can objects help action recognition?
Xingyi Zhou
Anurag Arnab
Chen Sun
Cordelia Schmid
30
14
0
20 Jun 2023
Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers
Hongjie Wang
Bhishma Dedhia
N. Jha
ViT
VLM
28
25
0
27 May 2023
Do We Really Need a Large Number of Visual Prompts?
Youngeun Kim
Yuhang Li
Abhishek Moitra
Ruokai Yin
Priyadarshini Panda
VLM
VPVLM
34
5
0
26 May 2023
RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer
Jiahao Wang
Songyang Zhang
Yong Liu
Taiqiang Wu
Yujiu Yang
Xihui Liu
Kai-xiang Chen
Ping Luo
Dahua Lin
11
20
0
12 Apr 2023
Training-Free Acceleration of ViTs with Delayed Spatial Merging
J. Heo
Seyedarmin Azizi
A. Fayyazi
Massoud Pedram
36
3
0
04 Mar 2023
Dynamic Feature Pruning and Consolidation for Occluded Person Re-Identification
Yuteng Ye
Hang Zhou
Jiale Cai
Chenxing Gao
Youjia Zhang
Junle Wang
Qiang Hu
Junqing Yu
Wei Yang
15
5
0
27 Nov 2022
HeatViT: Hardware-Efficient Adaptive Token Pruning for Vision Transformers
Peiyan Dong
Mengshu Sun
Alec Lu
Yanyue Xie
Li-Yu Daisy Liu
...
Xin Meng
Z. Li
Xue Lin
Zhenman Fang
Yanzhi Wang
ViT
18
56
0
15 Nov 2022
Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning
Weicong Liang
Yuhui Yuan
Henghui Ding
Xiao Luo
Weihong Lin
Ding Jia
Zheng-Wei Zhang
Chao Zhang
Hanhua Hu
17
25
0
03 Oct 2022
Dynamic Focus-aware Positional Queries for Semantic Segmentation
Haoyu He
Jianfei Cai
Zizheng Pan
Jing Liu
Jing Zhang
Dacheng Tao
Bohan Zhuang
29
16
0
04 Apr 2022
T6D-Direct: Transformers for Multi-Object 6D Pose Direct Regression
Arash A. Amini
Arul Selvam Periyasamy
Sven Behnke
ViT
47
28
0
22 Sep 2021
Pix2seq: A Language Modeling Framework for Object Detection
Ting-Li Chen
Saurabh Saxena
Lala Li
David J. Fleet
Geoffrey E. Hinton
MLLM
ViT
VLM
233
341
0
22 Sep 2021
PSViT: Better Vision Transformer via Token Pooling and Attention Sharing
Boyu Chen
Peixia Li
Baopu Li
Chuming Li
Lei Bai
Chen Lin
Ming-hui Sun
Junjie Yan
Wanli Ouyang
ViT
63
33
0
07 Aug 2021
Neuromorphic Algorithm-hardware Codesign for Temporal Pattern Learning
Haowen Fang
Brady Taylor
Ziru Li
Zaidao Mei
Hai Helen Li
Qinru Qiu
19
10
0
21 Apr 2021
Transformer in Transformer
Kai Han
An Xiao
Enhua Wu
Jianyuan Guo
Chunjing Xu
Yunhe Wang
ViT
282
1,490
0
27 Feb 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
263
3,538
0
24 Feb 2021
Instance Localization for Self-supervised Detection Pretraining
Ceyuan Yang
Zhirong Wu
Bolei Zhou
Stephen Lin
ViT
SSL
95
144
0
16 Feb 2021
Bottleneck Transformers for Visual Recognition
A. Srinivas
Tsung-Yi Lin
Niki Parmar
Jonathon Shlens
Pieter Abbeel
Ashish Vaswani
SLR
267
955
0
27 Jan 2021
TrackFormer: Multi-Object Tracking with Transformers
Tim Meinhardt
A. Kirillov
Laura Leal-Taixe
Christoph Feichtenhofer
VOT
216
732
0
07 Jan 2021
1