ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.10945
  4. Cited By
HiRED: Attention-Guided Token Dropping for Efficient Inference of
  High-Resolution Vision-Language Models in Resource-Constrained Environments

HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models in Resource-Constrained Environments

20 August 2024
Kazi Hasan Ibn Arif
JinYi Yoon
Dimitrios S. Nikolopoulos
Hans Vandierendonck
Deepu John
Bo Ji
    MLLMVLM
ArXiv (abs)PDFHTMLHuggingFace (11 upvotes)

Papers citing "HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models in Resource-Constrained Environments"

19 / 19 papers shown
Title
Don't Just Chase "Highlighted Tokens" in MLLMs: Revisiting Visual Holistic Context Retention
Don't Just Chase "Highlighted Tokens" in MLLMs: Revisiting Visual Holistic Context Retention
Xin Zou
Di Lu
Yizhou Wang
Yibo Yan
Yuanhuiyi Lyu
Xu Zheng
Linfeng Zhang
Xuming Hu
VLM
225
5
0
03 Oct 2025
LightVLM: Acceleraing Large Multimodal Models with Pyramid Token Merging and KV Cache Compression
LightVLM: Acceleraing Large Multimodal Models with Pyramid Token Merging and KV Cache Compression
Lianyu Hu
Fanhua Shang
Wei Feng
Liang Wan
MLLMVLM
104
0
0
30 Aug 2025
ReGATE: Learning Faster and Better with Fewer Tokens in MLLMs
ReGATE: Learning Faster and Better with Fewer Tokens in MLLMs
Chaoyu Li
Yogesh Kulkarni
Pooyan Fazli
127
0
0
29 Jul 2025
Frequency-Aware Autoregressive Modeling for Efficient High-Resolution Image Synthesis
Frequency-Aware Autoregressive Modeling for Efficient High-Resolution Image Synthesis
Zhuokun Chen
Jugang Fan
Zhuowei Yu
Bohan Zhuang
Zhuliang Yu
DiffM
116
3
0
28 Jul 2025
When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios
When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios
Kele Shao
Keda Tao
Kejia Zhang
Sicheng Feng
Mu Cai
Yuzhang Shang
Haoxuan You
Can Qin
Yang Sui
Huan Wang
429
9
0
27 Jul 2025
AuroraLong: Bringing RNNs Back to Efficient Open-Ended Video Understanding
AuroraLong: Bringing RNNs Back to Efficient Open-Ended Video Understanding
Weili Xu
Enxin Song
Wenhao Chai
Xuexiang Wen
Tian-Chun Ye
Gaoang Wang
256
3
0
03 Jul 2025
VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models
VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models
Ce Zhang
Kaixin Ma
Tianqing Fang
Wenhao Yu
Hongming Zhang
Zhisong Zhang
Yaqi Xie
Katia Sycara
Haitao Mi
Dong Yu
VLM
246
5
0
28 May 2025
PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models
PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language ModelsComputer Vision and Pattern Recognition (CVPR), 2025
M. Dhouib
Davide Buscaldi
Sonia Vanier
A. Shabou
VLM
269
15
0
11 Apr 2025
LLaVAction: evaluating and training multi-modal large language models for action recognition
LLaVAction: evaluating and training multi-modal large language models for action recognition
Shaokai Ye
Haozhe Qi
Alexander Mathis
Mackenzie W. Mathis
302
3
0
24 Mar 2025
Multi-Cue Adaptive Visual Token Pruning for Large Vision-Language Models
Bozhi Luan
Wengang Zhou
Hao Feng
Zhe Wang
Xiaosong Li
Haoyang Li
VLM
249
0
0
11 Mar 2025
When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning
When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning
Junwei Luo
Yingying Zhang
Xiaoyu Yang
Kang Wu
Qi Zhu
Lei Liang
Jingdong Chen
Yansheng Li
381
10
0
10 Mar 2025
ToFu: Visual Tokens Reduction via Fusion for Multi-modal, Multi-patch, Multi-image Task
Vittorio Pippi
Matthieu Guillaumin
S. Cascianelli
Rita Cucchiara
M. Jaritz
Loris Bazzani
171
0
0
06 Mar 2025
See What You Are Told: Visual Attention Sink in Large Multimodal ModelsInternational Conference on Learning Representations (ICLR), 2025
Seil Kang
Jinyeong Kim
Junhyeok Kim
Seong Jae Hwang
VLM
282
42
0
05 Mar 2025
ECG-Byte: A Tokenizer for End-to-End Generative Electrocardiogram Language Modeling
ECG-Byte: A Tokenizer for End-to-End Generative Electrocardiogram Language Modeling
William Jongwon Han
Chaojing Duan
M. Rosenberg
Emerson Liu
Ding Zhao
336
3
0
18 Dec 2024
LLaVA-Zip: Adaptive Visual Token Compression with Intrinsic Image
  Information
LLaVA-Zip: Adaptive Visual Token Compression with Intrinsic Image Information
Ke Wang
Hong Xuan
VLM
228
2
0
11 Dec 2024
FocusLLaVA: A Coarse-to-Fine Approach for Efficient and Effective Visual
  Token Compression
FocusLLaVA: A Coarse-to-Fine Approach for Efficient and Effective Visual Token Compression
Yuke Zhu
Chi Xie
Shuang Liang
Bo Zheng
Sheng Guo
263
15
0
21 Nov 2024
VL-Cache: Sparsity and Modality-Aware KV Cache Compression for
  Vision-Language Model Inference Acceleration
VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference AccelerationInternational Conference on Learning Representations (ICLR), 2024
Dezhan Tu
Danylo Vashchilenko
Yuzhe Lu
Panpan Xu
VLM
212
21
0
29 Oct 2024
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
Long Xing
Qidong Huang
Xiaoyi Dong
Jiajie Lu
Pan Zhang
...
Yuhang Cao
Bin Wang
Jiaqi Wang
Feng Wu
Dahua Lin
VLM
263
126
0
22 Oct 2024
ZipVL: Efficient Large Vision-Language Models with Dynamic Token
  Sparsification and KV Cache Compression
ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification and KV Cache Compression
Yefei He
Feng Chen
Jing Liu
Wenqi Shao
Hong Zhou
Jianchao Tan
Bohan Zhuang
VLM
240
33
0
11 Oct 2024
1