ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.03321
  4. Cited By
See What You Are Told: Visual Attention Sink in Large Multimodal Models

See What You Are Told: Visual Attention Sink in Large Multimodal Models

International Conference on Learning Representations (ICLR), 2025
5 March 2025
Seil Kang
Jinyeong Kim
Junhyeok Kim
Seong Jae Hwang
    VLM
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)Github

Papers citing "See What You Are Told: Visual Attention Sink in Large Multimodal Models"

33 / 33 papers shown
Attention Misses Visual Risk: Risk-Adaptive Steering for Multimodal Safety Alignment
Attention Misses Visual Risk: Risk-Adaptive Steering for Multimodal Safety Alignment
Jonghyun Park
Minhyuk Seo
Jonghyun Choi
Jonghyun Choi
LLMSV
368
1
0
30 Mar 2026
Tell Model Where to Look: Mitigating Hallucinations in MLLMs by Vision-Guided Attention
Tell Model Where to Look: Mitigating Hallucinations in MLLMs by Vision-Guided Attention
Jianfei Zhao
Feng Zhang
Xin Sun
Chong Feng
Zhixing Tan
MLLMLRM
237
1
0
25 Nov 2025
Can Vision-Language Models Count? A Synthetic Benchmark and Analysis of Attention-Based Interventions
Can Vision-Language Models Count? A Synthetic Benchmark and Analysis of Attention-Based Interventions
S. Sengupta
Nazanin Moradinasab
Jiebei Liu
Donald E. Brown
CoGeVLM
561
2
0
21 Nov 2025
Attention Guided Alignment in Efficient Vision-Language Models
Attention Guided Alignment in Efficient Vision-Language Models
Shweta Mahajan
Hoang Le
Hyojin Park
Farzad Farhadzadeh
Munawar Hayat
Fatih Porikli
VLM
195
1
0
21 Nov 2025
Capturing Gaze Shifts for Guidance: Cross-Modal Fusion Enhancement for VLM Hallucination Mitigation
Capturing Gaze Shifts for Guidance: Cross-Modal Fusion Enhancement for VLM Hallucination Mitigation
Zheng Qi
Chao Shang
Evangelia Spiliopoulou
Nikolaos Pappas
209
3
0
24 Oct 2025
Decomposed Attention Fusion in MLLMs for Training-Free Video Reasoning Segmentation
Decomposed Attention Fusion in MLLMs for Training-Free Video Reasoning Segmentation
Su Ho Han
Jeongseok Hyun
Pilhyeon Lee
Minho Shim
Dongyoon Wee
Seon Joo Kim
VOSVLM
356
0
0
22 Oct 2025
SparseVILA: Decoupling Visual Sparsity for Efficient VLM Inference
SparseVILA: Decoupling Visual Sparsity for Efficient VLM Inference
Samir Khaki
Junxian Guo
Jiaming Tang
Shang Yang
Yukang Chen
Konstantinos N. Plataniotis
Yao Lu
Song Han
Zhijian Liu
MLLMVLM
224
5
0
20 Oct 2025
Segmentation as A Plug-and-Play Capability for Frozen Multimodal LLMs
Segmentation as A Plug-and-Play Capability for Frozen Multimodal LLMs
Jiazhen Liu
Long Chen
MLLMVLM
195
3
0
19 Oct 2025
SHIELD: Suppressing Hallucinations In LVLM Encoders via Bias and Vulnerability Defense
SHIELD: Suppressing Hallucinations In LVLM Encoders via Bias and Vulnerability Defense
Y. Huang
Liang Shi
Yitian Zhang
Yi Tian Xu
Yun Fu
AAML
146
3
0
18 Oct 2025
Reallocating Attention Across Layers to Reduce Multimodal Hallucination
Reallocating Attention Across Layers to Reduce Multimodal Hallucination
H. Lu
Bolun Chu
Weiye Fu
Guoshun Nan
Junning Liu
Minghui Pan
Qiankun Li
Yi Yu
Hua Wang
Kun Wang
LRM
183
0
0
11 Oct 2025
Value-State Gated Attention for Mitigating Extreme-Token Phenomena in Transformers
Value-State Gated Attention for Mitigating Extreme-Token Phenomena in Transformers
Rui Bu
Haofeng Zhong
Wenzheng Chen
Yangyan Li
221
1
0
10 Oct 2025
To Sink or Not to Sink: Visual Information Pathways in Large Vision-Language Models
To Sink or Not to Sink: Visual Information Pathways in Large Vision-Language Models
Jiayun Luo
Wan-Cyuan Fan
Lyuyang Wang
Xiangteng He
Tanzila Rahman
Purang Abolmaesumi
Leonid Sigal
LRM
195
7
0
09 Oct 2025
Activation Quantization of Vision Encoders Needs Prefixing Registers
Activation Quantization of Vision Encoders Needs Prefixing Registers
S. Kim
Jinho Kim
Taesun Yeom
Wonpyo Park
Kyuyeun Kim
Jaeho Lee
MQVLM
261
0
0
06 Oct 2025
HiDe: Rethinking The Zoom-IN method in High Resolution MLLMs via Hierarchical Decoupling
HiDe: Rethinking The Zoom-IN method in High Resolution MLLMs via Hierarchical Decoupling
Xianjie Liu
Yiman Hu
Yixiong Zou
Liang Wu
Jian Xu
Bo Zheng
170
4
0
28 Sep 2025
RefAM: Attention Magnets for Zero-Shot Referral Segmentation
RefAM: Attention Magnets for Zero-Shot Referral Segmentation
Anna Kukleva
Enis Simsar
A. Tonioni
Muhammad Ferjad Naeem
F. Tombari
J. E. Lenssen
Bernt Schiele
DiffMVLM
687
0
0
26 Sep 2025
Catching the Details: Self-Distilled RoI Predictors for Fine-Grained MLLM Perception
Catching the Details: Self-Distilled RoI Predictors for Fine-Grained MLLM Perception
Yuheng Shi
Xiaohuan Pei
Minjing Dong
Chang Xu
ObjD
314
1
0
21 Sep 2025
See&Trek: Training-Free Spatial Prompting for Multimodal Large Language Model
See&Trek: Training-Free Spatial Prompting for Multimodal Large Language Model
Pengteng Li
Pinhao Song
Wuyang Li
Weiyu Guo
Huizai Yao
Ziyang Chen
Dugang Liu
Hui Xiong
LRMVLM
202
2
0
19 Sep 2025
Cross-Layer Vision Smoothing: Enhancing Visual Understanding via Sustained Focus on Key Objects in Large Vision-Language Models
Cross-Layer Vision Smoothing: Enhancing Visual Understanding via Sustained Focus on Key Objects in Large Vision-Language Models
Jianfei Zhao
Feng Zhang
Xin Sun
Lingxing Kong
Zhixing Tan
193
1
0
16 Sep 2025
Examining Vision Language Models through Multi-dimensional Experiments with Vision and Text Features
Examining Vision Language Models through Multi-dimensional Experiments with Vision and Text Features
S. Sengupta
Nazanin Moradinasab
Jiebei Liu
Donald Brown
CoGeVLM
142
0
0
10 Sep 2025
Tracing and Mitigating Hallucinations in Multimodal LLMs via Dynamic Attention Localization
Tracing and Mitigating Hallucinations in Multimodal LLMs via Dynamic Attention Localization
Tiancheng Yang
L. Zhang
J. Lin
Guimin Hu
Haiyan Zhao
Lijie Hu
326
0
0
09 Sep 2025
GLSim: Detecting Object Hallucinations in LVLMs via Global-Local Similarity
GLSim: Detecting Object Hallucinations in LVLMs via Global-Local Similarity
Seongheon Park
Yixuan Li
209
3
0
27 Aug 2025
Multimodal Chain of Continuous Thought for Latent-Space Reasoning in Vision-Language Models
Multimodal Chain of Continuous Thought for Latent-Space Reasoning in Vision-Language Models
Tan-Hanh Pham
Chris Ngo
LRM
228
8
0
18 Aug 2025
A Survey of Multimodal Hallucination Evaluation and Detection
A Survey of Multimodal Hallucination Evaluation and Detection
Zhiyuan Chen
Yuecong Min
Jie M. Zhang
Bei Yan
Jiahao Wang
X. Wang
Shiguang Shan
HILM
463
11
0
25 Jul 2025
Rethinking Explainability in the Era of Multimodal AI
Rethinking Explainability in the Era of Multimodal AI
Chirag Agarwal
303
3
0
16 Jun 2025
Revisit What You See: Disclose Language Prior in Vision Tokens for LVLM Decoding
Revisit What You See: Disclose Language Prior in Vision Tokens for LVLM Decoding
Beomsik Cho
Jaehyung Kim
334
0
0
11 Jun 2025
When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding
When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding
Yan Shu
Hangui Lin
Yexin Liu
Yan Zhang
Gangyan Zeng
Yan Li
Can Ma
Ser-Nam Lim
Harry Yang
Andrii Zadaianchuk
MLLMVLM
443
7
0
05 Jun 2025
Don't Deceive Me: Mitigating Gaslighting through Attention Reallocation in LMMs
Don't Deceive Me: Mitigating Gaslighting through Attention Reallocation in LMMs
Pengkun Jiao
Bin Zhu
Yue Yu
Chong-Wah Ngo
Yu Jiang
401
4
0
13 Apr 2025
The Power of One: A Single Example is All it Takes for Segmentation in VLMs
The Power of One: A Single Example is All it Takes for Segmentation in VLMs
Mir Rayat Imtiaz Hossain
Mennatullah Siam
Leonid Sigal
James J. Little
MLLMVLM
649
4
0
13 Mar 2025
Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding
Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual GroundingComputer Vision and Pattern Recognition (CVPR), 2025
Seil Kang
Jinyeong Kim
Junhyeok Kim
Seong Jae Hwang
VLM
344
49
0
08 Mar 2025
Visual Attention Never Fades: Selective Progressive Attention ReCalibration for Detailed Image Captioning in Multimodal Large Language Models
Visual Attention Never Fades: Selective Progressive Attention ReCalibration for Detailed Image Captioning in Multimodal Large Language Models
Mingi Jung
Saehuyng Lee
Eunji Kim
Sungroh Yoon
1.1K
11
0
03 Feb 2025
Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation
Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic SegmentationComputer Vision and Pattern Recognition (CVPR), 2024
Chanyoung Kim
Dayun Ju
Woojung Han
Ming-Hsuan Yang
Seong Jae Hwang
VLMVOS
880
18
0
26 Nov 2024
MaskControl: Spatio-Temporal Control for Masked Motion Synthesis
MaskControl: Spatio-Temporal Control for Masked Motion Synthesis
Ekkasit Pinyoanuntapong
Muhammad Usama Saleem
Korrawe Karunratanakul
Pu Wang
Hongfei Xue
Chong Chen
Chuan Guo
Junli Cao
J. Ren
Sergey Tulyakov
VGen
556
116
0
14 Oct 2024
Towards Interpreting Visual Information Processing in Vision-Language Models
Towards Interpreting Visual Information Processing in Vision-Language ModelsInternational Conference on Learning Representations (ICLR), 2024
Philip Quirke
Luke Ong
Juil Sock
Mor Geva
David M. Krueger
Fazl Barez
647
69
0
09 Oct 2024
1
Page 1 of 1