ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.11315
  4. Cited By
Beyond Attentive Tokens: Incorporating Token Importance and Diversity
  for Efficient Vision Transformers

Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers

21 November 2022
Sifan Long
Z. Zhao
Jimin Pi
Sheng-sheng Wang
Jingdong Wang
ArXivPDFHTML

Papers citing "Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers"

23 / 23 papers shown
Title
When Less is Enough: Adaptive Token Reduction for Efficient Image Representation
When Less is Enough: Adaptive Token Reduction for Efficient Image Representation
Eduard Allakhverdov
Elizaveta Goncharova
Andrey Kuznetsov
42
0
0
20 Mar 2025
ImagePiece: Content-aware Re-tokenization for Efficient Image
  Recognition
ImagePiece: Content-aware Re-tokenization for Efficient Image Recognition
Seungdong Yoa
Seungjun Lee
Hyeseung Cho
Bumsoo Kim
Woohyung Lim
ViT
67
0
0
21 Dec 2024
Deploying Foundation Model Powered Agent Services: A Survey
Deploying Foundation Model Powered Agent Services: A Survey
Wenchao Xu
Jinyu Chen
Peirong Zheng
Xiaoquan Yi
Tianyi Tian
...
Quan Wan
Haozhao Wang
Yunfeng Fan
Qinliang Su
Xuemin Shen
AI4CE
115
1
0
18 Dec 2024
Token Cropr: Faster ViTs for Quite a Few Tasks
Token Cropr: Faster ViTs for Quite a Few Tasks
Benjamin Bergner
C. Lippert
Aravindh Mahendran
ViT
VLM
64
0
0
01 Dec 2024
Patch Ranking: Efficient CLIP by Learning to Rank Local Patches
Patch Ranking: Efficient CLIP by Learning to Rank Local Patches
Cheng-En Wu
Jinhong Lin
Yu Hen Hu
Pedro Morgado
VLM
18
0
0
22 Sep 2024
Agglomerative Token Clustering
Agglomerative Token Clustering
Joakim Bruslund Haurum
Sergio Escalera
Graham W. Taylor
T. Moeslund
29
1
0
18 Sep 2024
Recoverable Compression: A Multimodal Vision Token Recovery Mechanism
  Guided by Text Information
Recoverable Compression: A Multimodal Vision Token Recovery Mechanism Guided by Text Information
Yi Chen
Jian Xu
Xu-Yao Zhang
Wen-Zhuo Liu
Yang-Yang Liu
Cheng-Lin Liu
24
3
0
02 Sep 2024
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
Leqi Shen
Tianxiang Hao
Tao He
Sicheng Zhao
Pengzhang Liu
Yongjun Bao
Guiguang Ding
Guiguang Ding
97
7
0
02 Sep 2024
Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression
Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression
Dingyuan Zhang
Dingkang Liang
Zichang Tan
Xiaoqing Ye
Cheng Zhang
Jingdong Wang
Xiang Bai
ViT
44
2
0
01 Sep 2024
Vote&Mix: Plug-and-Play Token Reduction for Efficient Vision Transformer
Vote&Mix: Plug-and-Play Token Reduction for Efficient Vision Transformer
Shuai Peng
Di Fu
Baole Wei
Yong Cao
Liangcai Gao
Zhi Tang
ViT
35
1
0
30 Aug 2024
PRANCE: Joint Token-Optimization and Structural Channel-Pruning for
  Adaptive ViT Inference
PRANCE: Joint Token-Optimization and Structural Channel-Pruning for Adaptive ViT Inference
Ye Li
Chen Tang
Yuan Meng
Jiajun Fan
Zenghao Chai
Xinzhu Ma
Zhi Wang
Wenwu Zhu
29
1
0
06 Jul 2024
ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic
  Segmentation with Plain Vision Transformers
ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision Transformers
Narges Norouzi
Svetlana Orlova
Daan de Geus
Gijs Dubbelman
ViT
FedML
44
3
0
14 Jun 2024
SCE-MAE: Selective Correspondence Enhancement with Masked Autoencoder
  for Self-Supervised Landmark Estimation
SCE-MAE: Selective Correspondence Enhancement with Masked Autoencoder for Self-Supervised Landmark Estimation
Kejia Yin
Varshanth R. Rao
R. Jiang
Xudong Liu
P. Aarabi
David B. Lindell
35
0
0
28 May 2024
Data-independent Module-aware Pruning for Hierarchical Vision
  Transformers
Data-independent Module-aware Pruning for Hierarchical Vision Transformers
Yang He
Joey Tianyi Zhou
ViT
40
3
0
21 Apr 2024
Leveraging Temporal Contextualization for Video Action Recognition
Leveraging Temporal Contextualization for Video Action Recognition
Minji Kim
Dongyoon Han
Taekyung Kim
Bohyung Han
43
2
0
15 Apr 2024
Arena: A Patch-of-Interest ViT Inference Acceleration System for
  Edge-Assisted Video Analytics
Arena: A Patch-of-Interest ViT Inference Acceleration System for Edge-Assisted Video Analytics
Haosong Peng
Wei Feng
Hao Li
Yufeng Zhan
Qihua Zhou
Yuanqing Xia
26
2
0
14 Apr 2024
Multi-criteria Token Fusion with One-step-ahead Attention for Efficient
  Vision Transformers
Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers
Sanghyeok Lee
Joonmyung Choi
Hyunwoo J. Kim
ViT
31
7
0
15 Mar 2024
Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose
  Estimation
Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation
Wenhao Li
Mengyuan Liu
Hong Liu
Pichao Wang
Jia Cai
N. Sebe
ViT
3DH
19
10
0
20 Nov 2023
Which Tokens to Use? Investigating Token Reduction in Vision
  Transformers
Which Tokens to Use? Investigating Token Reduction in Vision Transformers
Joakim Bruslund Haurum
Sergio Escalera
Graham W. Taylor
T. Moeslund
ViT
36
33
0
09 Aug 2023
Multi-scale Efficient Graph-Transformer for Whole Slide Image
  Classification
Multi-scale Efficient Graph-Transformer for Whole Slide Image Classification
Saisai Ding
Juncheng Li
Jun Wang
Shihui Ying
Jun Shi
ViT
MedIm
19
7
0
25 May 2023
Expediting Large-Scale Vision Transformer for Dense Prediction without
  Fine-tuning
Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning
Weicong Liang
Yuhui Yuan
Henghui Ding
Xiao Luo
Weihong Lin
Ding Jia
Zheng-Wei Zhang
Chao Zhang
Hanhua Hu
17
25
0
03 Oct 2022
Transformer in Transformer
Transformer in Transformer
Kai Han
An Xiao
Enhua Wu
Jianyuan Guo
Chunjing Xu
Yunhe Wang
ViT
282
1,518
0
27 Feb 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction
  without Convolutions
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
263
3,604
0
24 Feb 2021
1