v1v2v3v4 (latest)

Interpreting CLIP's Image Representation via Text-Based Decomposition

International Conference on Learning Representations (ICLR), 2023

9 October 2023

ArXiv (abs)PDF HTML HuggingFace (2 upvotes)

Papers citing "Interpreting CLIP's Image Representation via Text-Based Decomposition"

50 / 122 papers shown

Too Late to Recall: Explaining the Two-Hop Problem in Multimodal Knowledge Retrieval

02 Dec 2025

InstanceV: Instance-Level Video Generation

121

28 Nov 2025

Mechanistic Finetuning of Vision-Language-Action Models via Few-Shot Demonstrations

27 Nov 2025

Representation-Level Counterfactual Calibration for Debiased Zero-Shot Recognition

295

30 Oct 2025

Enhancing Pre-trained Representation Classifiability can Boost its InterpretabilityInternational Conference on Learning Representations (ICLR), 2025

418

28 Oct 2025

Understanding Multi-View Transformers

28 Oct 2025

Improving Visual Discriminability of CLIP for Training-Free Open-Vocabulary Semantic Segmentation

202

27 Oct 2025

VL-SAE: Interpreting and Enhancing Vision-Language Alignment with a Unified Concept Set

149

24 Oct 2025

Automated Detection of Visual Attribute Reliance with a Self-Reflective Agent

195

24 Oct 2025

Head Pursuit: Probing Attention Specialization in Multimodal Transformers

123

24 Oct 2025

Enhancing Concept Localization in CLIP-based Concept Bottleneck Models

167

08 Oct 2025

Conditional Representation Learning for Customized Tasks

160

06 Oct 2025

Visual Representations inside the Language Model

Benlin Liu

Amita Kamath

Madeleine Grunde-McLaughlin

Winson Han

Ranjay Krishna

151

06 Oct 2025

TextCAM: Explaining Class Activation Map with Text

121

01 Oct 2025

Interpret, prune and distill Donut : towards lightweight VLMs for VQA on document

Adnan Ben Mansour

Ayoub Karine

D. Naccache

131

30 Sep 2025

REMA: A Unified Reasoning Manifold Framework for Interpreting Large Language Model

111

26 Sep 2025

RefAM: Attention Magnets for Zero-Shot Referral Segmentation

Anna Kukleva

Enis Simsar

A. Tonioni

Muhammad Ferjad Naeem

645

26 Sep 2025

Statistical Inference Leveraging Synthetic Data with Distribution-Free Guarantees

186

24 Sep 2025

Interpreting ResNet-based CLIP via Neuron-Attention Decomposition

Edmund Bu

Yossi Gandelsman

226

24 Sep 2025

Reading Images Like Texts: Sequential Image Understanding in Vision-Language Models

129

23 Sep 2025

TensLoRA: Tensor Alternatives for Low-Rank Adaptation

François Leduc-Primeau

22 Sep 2025

V-SEAM: Visual Semantic Editing and Attention Modulating for Causal Interpretability of Vision-Language Models

Qidong Wang

Junjie Hu

Ming Jiang

104

18 Sep 2025

Attention Lattice Adapter: Visual Explanation Generation for Visual Foundation Model

143

18 Sep 2025

Discovering Divergent Representations between Text-to-Image Models

126

10 Sep 2025

Singular Value Few-shot Adaptation of Vision-Language Models

249

03 Sep 2025

Disentangling Latent Embeddings with Sparse Linear Concept Subspaces (SLiCS)

161

27 Aug 2025

Model Science: getting serious about verification, explanation and control of AI systems

Przemyslaw Biecek

Wojciech Samek

120

27 Aug 2025

From Global to Local: Social Bias Transfer in CLIP

118

25 Aug 2025

Do VLMs Have Bad Eyes? Diagnosing Compositional Failures via Mechanistic Interpretability

Ashwath Vaithinathan Aravindan

Abha Jha

Mihir Kulkarni

CoGe

163

20 Aug 2025

Preserve and Sculpt: Manifold-Aligned Fine-tuning of Vision-Language Models for Few-Shot Learning

135

18 Aug 2025

Probing the Representational Power of Sparse Autoencoders in Vision Models

212

15 Aug 2025

Explaining Similarity in Vision-Language Encoders with Weighted Banzhaf Interactions

230

07 Aug 2025

Unraveling Hidden Representations: A Multi-Modal Layer Analysis for Better Synthetic Content Forensics

Tom Or

Omri Azencot

AAML

189

01 Aug 2025

Attention (as Discrete-Time Markov) Chains

292

23 Jul 2025

Not All Attention Heads Are What You Need: Refining CLIP's Image Representation with Attention Ablation

110

01 Jul 2025

Quantifying Structure in CLIP Embeddings: A Statistical Framework for Concept Interpretation

184

16 Jun 2025

How Visual Representations Map to Language Feature Space in Multimodal LLMs

295

13 Jun 2025

Where and How to Perturb: On the Design of Perturbation Guidance in Diffusion and Flow Models

473

12 Jun 2025

Improving Personalized Search with Regularized Low-Rank Parameter UpdatesComputer Vision and Pattern Recognition (CVPR), 2025

224

11 Jun 2025

Same Task, Different Circuits: Disentangling Modality-Specific Mechanisms in VLMs

321

10 Jun 2025

LLMs Can Compensate for Deficiencies in Visual Representations

Yova Kementchedjhieva

VLM

215

05 Jun 2025

From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit

318

03 Jun 2025

Concept-Centric Token Interpretation for Vector-Quantized Generative Models

272

31 May 2025

VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models

312

28 May 2025

Domain Adaptation of Attention Heads for Zero-shot Anomaly Detection

187

28 May 2025

In-Context Brush: Zero-shot Customized Subject Insertion with Context-Aware Latent Space Manipulation

208

26 May 2025

From What to How: Attributing CLIP's Latent Components Reveals Unexpected Semantic Reliance

262

26 May 2025

Debiasing CLIP: Interpreting and Correcting Bias in Attention Heads

286

23 May 2025

Multimodal Conditional Information Bottleneck for Generalizable AI-Generated Image Detection

315

21 May 2025

Task Reconstruction and Extrapolation for

π_0

using Text Latent

Quanyi Li

642

06 May 2025