See What You Are Told: Visual Attention Sink in Large Multimodal Models

International Conference on Learning Representations (ICLR), 2025

5 March 2025

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)Github

Papers citing "See What You Are Told: Visual Attention Sink in Large Multimodal Models"

33 / 33 papers shown

Attention Misses Visual Risk: Risk-Adaptive Steering for Multimodal Safety Alignment

368

30 Mar 2026

Tell Model Where to Look: Mitigating Hallucinations in MLLMs by Vision-Guided Attention

237

25 Nov 2025

Can Vision-Language Models Count? A Synthetic Benchmark and Analysis of Attention-Based Interventions

561

21 Nov 2025

Attention Guided Alignment in Efficient Vision-Language Models

195

21 Nov 2025

Capturing Gaze Shifts for Guidance: Cross-Modal Fusion Enhancement for VLM Hallucination Mitigation

Zheng Qi

Chao Shang

Evangelia Spiliopoulou

Nikolaos Pappas

209

24 Oct 2025

Decomposed Attention Fusion in MLLMs for Training-Free Video Reasoning Segmentation

356

22 Oct 2025

SparseVILA: Decoupling Visual Sparsity for Efficient VLM Inference

Konstantinos N. Plataniotis

224

20 Oct 2025

Segmentation as A Plug-and-Play Capability for Frozen Multimodal LLMs

Jiazhen Liu

Long Chen

MLLM VLM

195

19 Oct 2025

SHIELD: Suppressing Hallucinations In LVLM Encoders via Bias and Vulnerability Defense

146

18 Oct 2025

Reallocating Attention Across Layers to Reduce Multimodal Hallucination

183

11 Oct 2025

Value-State Gated Attention for Mitigating Extreme-Token Phenomena in Transformers

221

10 Oct 2025

To Sink or Not to Sink: Visual Information Pathways in Large Vision-Language Models

195

09 Oct 2025

Activation Quantization of Vision Encoders Needs Prefixing Registers

261

06 Oct 2025

HiDe: Rethinking The Zoom-IN method in High Resolution MLLMs via Hierarchical Decoupling

170

28 Sep 2025

RefAM: Attention Magnets for Zero-Shot Referral Segmentation

Anna Kukleva

Enis Simsar

A. Tonioni

Muhammad Ferjad Naeem

687

26 Sep 2025

Catching the Details: Self-Distilled RoI Predictors for Fine-Grained MLLM Perception

314

21 Sep 2025

See&Trek: Training-Free Spatial Prompting for Multimodal Large Language Model

202

19 Sep 2025

Cross-Layer Vision Smoothing: Enhancing Visual Understanding via Sustained Focus on Key Objects in Large Vision-Language Models

193

16 Sep 2025

Examining Vision Language Models through Multi-dimensional Experiments with Vision and Text Features

142

10 Sep 2025

Tracing and Mitigating Hallucinations in Multimodal LLMs via Dynamic Attention Localization

326

09 Sep 2025

GLSim: Detecting Object Hallucinations in LVLMs via Global-Local Similarity

Seongheon Park

Yixuan Li

209

27 Aug 2025

Multimodal Chain of Continuous Thought for Latent-Space Reasoning in Vision-Language Models

Tan-Hanh Pham

Chris Ngo

LRM

228

18 Aug 2025

A Survey of Multimodal Hallucination Evaluation and Detection

463

25 Jul 2025

Rethinking Explainability in the Era of Multimodal AI

Chirag Agarwal

303

16 Jun 2025

Revisit What You See: Disclose Language Prior in Vision Tokens for LVLM Decoding

Beomsik Cho

Jaehyung Kim

334

11 Jun 2025

When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding

443

05 Jun 2025

Don't Deceive Me: Mitigating Gaslighting through Attention Reallocation in LMMs

401

13 Apr 2025

The Power of One: A Single Example is All it Takes for Segmentation in VLMs

Mir Rayat Imtiaz Hossain

649

13 Mar 2025

Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual GroundingComputer Vision and Pattern Recognition (CVPR), 2025

344

08 Mar 2025

Visual Attention Never Fades: Selective Progressive Attention ReCalibration for Detailed Image Captioning in Multimodal Large Language Models

1.1K

03 Feb 2025

Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic SegmentationComputer Vision and Pattern Recognition (CVPR), 2024

880

26 Nov 2024

MaskControl: Spatio-Temporal Control for Masked Motion Synthesis

Ekkasit Pinyoanuntapong

Muhammad Usama Saleem

Korrawe Karunratanakul

556

116

14 Oct 2024

Towards Interpreting Visual Information Processing in Vision-Language ModelsInternational Conference on Learning Representations (ICLR), 2024

647

09 Oct 2024