ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.05916
  4. Cited By
Interpreting CLIP's Image Representation via Text-Based Decomposition
v1v2v3v4 (latest)

Interpreting CLIP's Image Representation via Text-Based Decomposition

International Conference on Learning Representations (ICLR), 2023
9 October 2023
Yossi Gandelsman
Alexei A. Efros
Jacob Steinhardt
    VLM
ArXiv (abs)PDFHTMLHuggingFace (2 upvotes)

Papers citing "Interpreting CLIP's Image Representation via Text-Based Decomposition"

50 / 122 papers shown
Too Late to Recall: Explaining the Two-Hop Problem in Multimodal Knowledge Retrieval
Too Late to Recall: Explaining the Two-Hop Problem in Multimodal Knowledge Retrieval
Constantin Venhoff
Ashkan Khakzar
Sonia Joseph
Philip Torr
Neel Nanda
58
0
0
02 Dec 2025
InstanceV: Instance-Level Video Generation
InstanceV: Instance-Level Video Generation
Yuheng Chen
Teng Hu
Jiangning Zhang
Zhucun Xue
Ran Yi
Lizhuang Ma
DiffMVGen
121
0
0
28 Nov 2025
Mechanistic Finetuning of Vision-Language-Action Models via Few-Shot Demonstrations
Mechanistic Finetuning of Vision-Language-Action Models via Few-Shot Demonstrations
Chancharik Mitra
Yusen Luo
Raj Saravanan
Dantong Niu
Anirudh Pai
Jesse Thomason
Trevor Darrell
Abrar Anwar
Deva Ramanan
Roei Herzig
52
0
0
27 Nov 2025
Representation-Level Counterfactual Calibration for Debiased Zero-Shot Recognition
Representation-Level Counterfactual Calibration for Debiased Zero-Shot Recognition
Pei Peng
MingKun Xie
Hang Hao
Tong Jin
ShengJun Huang
BDLCML
295
0
0
30 Oct 2025
Enhancing Pre-trained Representation Classifiability can Boost its Interpretability
Enhancing Pre-trained Representation Classifiability can Boost its InterpretabilityInternational Conference on Learning Representations (ICLR), 2025
Shufan Shen
Zhaobo Qi
Junshu Sun
Qingming Huang
Qi Tian
Shuhui Wang
FAtt
418
4
0
28 Oct 2025
Understanding Multi-View Transformers
Understanding Multi-View Transformers
Michal Stary
Julien Gaubil
A. Tewari
Vincent Sitzmann
ViT
90
1
0
28 Oct 2025
Improving Visual Discriminability of CLIP for Training-Free Open-Vocabulary Semantic Segmentation
Improving Visual Discriminability of CLIP for Training-Free Open-Vocabulary Semantic Segmentation
Jinxin Zhou
Jiachen Jiang
Zhihui Zhu
VLM
202
0
0
27 Oct 2025
VL-SAE: Interpreting and Enhancing Vision-Language Alignment with a Unified Concept Set
VL-SAE: Interpreting and Enhancing Vision-Language Alignment with a Unified Concept Set
Shufan Shen
Junshu Sun
Qingming Huang
Shuhui Wang
149
1
0
24 Oct 2025
Automated Detection of Visual Attribute Reliance with a Self-Reflective Agent
Automated Detection of Visual Attribute Reliance with a Self-Reflective Agent
Christy Li
Josep Lopez Camunas
Jake Thomas Touchet
Jacob Andreas
Àgata Lapedriza
Antonio Torralba
Tamar Rott Shaham
195
0
0
24 Oct 2025
Head Pursuit: Probing Attention Specialization in Multimodal Transformers
Head Pursuit: Probing Attention Specialization in Multimodal Transformers
Lorenzo Basile
Valentino Maiorca
Diego Doimo
Francesco Locatello
Alberto Cazzaniga
123
2
0
24 Oct 2025
Enhancing Concept Localization in CLIP-based Concept Bottleneck Models
Enhancing Concept Localization in CLIP-based Concept Bottleneck Models
Rémi Kazmierczak
Steve Azzolin
Eloise Berthier
Goran Frehse
Gianni Franchi
167
0
0
08 Oct 2025
Conditional Representation Learning for Customized Tasks
Conditional Representation Learning for Customized Tasks
Honglin Liu
Chao Sun
Peng Hu
Yunfan Li
Xi Peng
160
0
0
06 Oct 2025
Visual Representations inside the Language Model
Visual Representations inside the Language Model
Benlin Liu
Amita Kamath
Madeleine Grunde-McLaughlin
Winson Han
Ranjay Krishna
151
2
0
06 Oct 2025
TextCAM: Explaining Class Activation Map with Text
TextCAM: Explaining Class Activation Map with Text
Qiming Zhao
Xingjian Li
Xiaoyu Cao
Xiaolong Wu
Min Xu
VLM
121
0
0
01 Oct 2025
Interpret, prune and distill Donut : towards lightweight VLMs for VQA on document
Interpret, prune and distill Donut : towards lightweight VLMs for VQA on document
Adnan Ben Mansour
Ayoub Karine
D. Naccache
131
0
0
30 Sep 2025
REMA: A Unified Reasoning Manifold Framework for Interpreting Large Language Model
REMA: A Unified Reasoning Manifold Framework for Interpreting Large Language Model
Bo Li
Guanzhi Deng
Ronghao Chen
Junrong Yue
Shuo Zhang
Qinghua Zhao
Linqi Song
Lijie Wen
LRM
111
1
0
26 Sep 2025
RefAM: Attention Magnets for Zero-Shot Referral Segmentation
RefAM: Attention Magnets for Zero-Shot Referral Segmentation
Anna Kukleva
Enis Simsar
A. Tonioni
Muhammad Ferjad Naeem
F. Tombari
J. E. Lenssen
Bernt Schiele
DiffMVLM
645
0
0
26 Sep 2025
Statistical Inference Leveraging Synthetic Data with Distribution-Free Guarantees
Statistical Inference Leveraging Synthetic Data with Distribution-Free Guarantees
Meshi Bashari
Yonghoon Lee
Roy Maor Lotan
Edgar Dobriban
Yaniv Romano
SyDa
186
1
0
24 Sep 2025
Interpreting ResNet-based CLIP via Neuron-Attention Decomposition
Interpreting ResNet-based CLIP via Neuron-Attention Decomposition
Edmund Bu
Yossi Gandelsman
226
0
0
24 Sep 2025
Reading Images Like Texts: Sequential Image Understanding in Vision-Language Models
Reading Images Like Texts: Sequential Image Understanding in Vision-Language Models
Yueyan Li
Chenggong Zhao
Zeyuan Zang
Caixia Yuan
Xiaojie Wang
VLM
129
0
0
23 Sep 2025
TensLoRA: Tensor Alternatives for Low-Rank Adaptation
TensLoRA: Tensor Alternatives for Low-Rank Adaptation
Axel Marmoret
Reda Bensaid
Jonathan Lys
Vincent Gripon
François Leduc-Primeau
89
0
0
22 Sep 2025
V-SEAM: Visual Semantic Editing and Attention Modulating for Causal Interpretability of Vision-Language Models
V-SEAM: Visual Semantic Editing and Attention Modulating for Causal Interpretability of Vision-Language Models
Qidong Wang
Junjie Hu
Ming Jiang
104
0
0
18 Sep 2025
Attention Lattice Adapter: Visual Explanation Generation for Visual Foundation Model
Attention Lattice Adapter: Visual Explanation Generation for Visual Foundation Model
Shinnosuke Hirano
Yuiga Wada
T. Iida
Komei Sugiura
143
0
0
18 Sep 2025
Discovering Divergent Representations between Text-to-Image Models
Discovering Divergent Representations between Text-to-Image Models
Lisa Dunlap
Joseph E. Gonzalez
Trevor Darrell
Fabian Caba Heilbron
Josef Sivic
Bryan C. Russell
EGVM
126
0
0
10 Sep 2025
Singular Value Few-shot Adaptation of Vision-Language Models
Singular Value Few-shot Adaptation of Vision-Language Models
Taha Koleilat
H. Rivaz
Yiming Xiao
VLM
249
0
0
03 Sep 2025
Disentangling Latent Embeddings with Sparse Linear Concept Subspaces (SLiCS)
Disentangling Latent Embeddings with Sparse Linear Concept Subspaces (SLiCS)
Zhi Li
Hau Phan
Matthew Emigh
Austin J. Brockmeier
CoGe
161
0
0
27 Aug 2025
Model Science: getting serious about verification, explanation and control of AI systems
Model Science: getting serious about verification, explanation and control of AI systems
Przemyslaw Biecek
Wojciech Samek
120
0
0
27 Aug 2025
From Global to Local: Social Bias Transfer in CLIP
From Global to Local: Social Bias Transfer in CLIP
Ryan Ramos
Yusuke Hirota
Yuta Nakashima
Noa Garcia
118
0
0
25 Aug 2025
Do VLMs Have Bad Eyes? Diagnosing Compositional Failures via Mechanistic Interpretability
Do VLMs Have Bad Eyes? Diagnosing Compositional Failures via Mechanistic Interpretability
Ashwath Vaithinathan Aravindan
Abha Jha
Mihir Kulkarni
CoGe
163
1
0
20 Aug 2025
Preserve and Sculpt: Manifold-Aligned Fine-tuning of Vision-Language Models for Few-Shot Learning
Preserve and Sculpt: Manifold-Aligned Fine-tuning of Vision-Language Models for Few-Shot Learning
Dexia Chen
Qianjie Zhu
Weibing Li
Yue Yu
Tong Zhang
Ruixuan Wang
135
0
0
18 Aug 2025
Probing the Representational Power of Sparse Autoencoders in Vision Models
Probing the Representational Power of Sparse Autoencoders in Vision Models
Matthew Lyle Olson
Musashi Hinck
Neale Ratzlaff
Changbai Li
Phillip Howard
Vasudev Lal
Shao-Yen Tseng
212
1
0
15 Aug 2025
Explaining Similarity in Vision-Language Encoders with Weighted Banzhaf Interactions
Explaining Similarity in Vision-Language Encoders with Weighted Banzhaf Interactions
Hubert Baniecki
Maximilian Muschalik
Fabian Fumagalli
Barbara Hammer
Eyke Hüllermeier
P. Biecek
FAtt
230
0
0
07 Aug 2025
Unraveling Hidden Representations: A Multi-Modal Layer Analysis for Better Synthetic Content Forensics
Unraveling Hidden Representations: A Multi-Modal Layer Analysis for Better Synthetic Content Forensics
Tom Or
Omri Azencot
AAML
189
1
0
01 Aug 2025
Attention (as Discrete-Time Markov) Chains
Attention (as Discrete-Time Markov) Chains
Yotam Erel
Olaf Dünkel
Rishabh Dabral
Vladislav Golyanik
Christian Theobalt
Amit H. Bermano
292
1
0
23 Jul 2025
Not All Attention Heads Are What You Need: Refining CLIP's Image Representation with Attention Ablation
Not All Attention Heads Are What You Need: Refining CLIP's Image Representation with Attention Ablation
Feng Lin
Marco Chen
Haokui Zhang
Xiaotian Yu
Guangming Lu
Rong Xiao
110
0
0
01 Jul 2025
Quantifying Structure in CLIP Embeddings: A Statistical Framework for Concept Interpretation
Quantifying Structure in CLIP Embeddings: A Statistical Framework for Concept Interpretation
Jitian Zhao
Chenghui Li
Frederic Sala
Karl Rohe
184
2
0
16 Jun 2025
How Visual Representations Map to Language Feature Space in Multimodal LLMs
How Visual Representations Map to Language Feature Space in Multimodal LLMs
Constantin Venhoff
Ashkan Khakzar
Sonia Joseph
Juil Sock
Neel Nanda
295
9
0
13 Jun 2025
Where and How to Perturb: On the Design of Perturbation Guidance in Diffusion and Flow Models
Where and How to Perturb: On the Design of Perturbation Guidance in Diffusion and Flow Models
Donghoon Ahn
Jiwon Kang
Sanghyun Lee
Minjae Kim
Jaewon Min
Wooseok Jang
Saungwu Lee
Sayak Paul
S. Hong
Seungryong Kim
DiffMAAML
473
0
0
12 Jun 2025
Improving Personalized Search with Regularized Low-Rank Parameter Updates
Improving Personalized Search with Regularized Low-Rank Parameter UpdatesComputer Vision and Pattern Recognition (CVPR), 2025
Fiona Ryan
Josef Sivic
Fabian Caba Heilbron
Judy Hoffman
James M. Rehg
Bryan C. Russell
224
1
0
11 Jun 2025
Same Task, Different Circuits: Disentangling Modality-Specific Mechanisms in VLMs
Yaniv Nikankin
Dana Arad
Yossi Gandelsman
Yonatan Belinkov
321
6
0
10 Jun 2025
LLMs Can Compensate for Deficiencies in Visual Representations
LLMs Can Compensate for Deficiencies in Visual Representations
Sho Takishita
Jay Gala
Abdelrahman Mohamed
Kentaro Inui
Yova Kementchedjhieva
VLM
215
0
0
05 Jun 2025
From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit
From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit
Valérie Costa
Thomas Fel
Ekdeep Singh Lubana
Bahareh Tolooshams
Demba Ba
318
9
0
03 Jun 2025
Concept-Centric Token Interpretation for Vector-Quantized Generative Models
Concept-Centric Token Interpretation for Vector-Quantized Generative Models
Tianze Yang
Yucheng Shi
Mengnan Du
Xuansheng Wu
Qiaoyu Tan
Jin Sun
Ninghao Liu
272
2
0
31 May 2025
VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models
VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models
Ce Zhang
Kaixin Ma
Tianqing Fang
Wenhao Yu
Hongming Zhang
Zhisong Zhang
Yaqi Xie
Katia Sycara
Haitao Mi
Dong Yu
VLM
312
7
0
28 May 2025
Domain Adaptation of Attention Heads for Zero-shot Anomaly Detection
Domain Adaptation of Attention Heads for Zero-shot Anomaly Detection
Kiyoon Jeong
Jaehyuk Heo
Junyeong Son
Pilsung Kang
VLM
187
0
0
28 May 2025
In-Context Brush: Zero-shot Customized Subject Insertion with Context-Aware Latent Space Manipulation
In-Context Brush: Zero-shot Customized Subject Insertion with Context-Aware Latent Space Manipulation
Yu Xu
Fan Tang
You Wu
Lin Gao
Oliver Deussen
Hongbin Yan
Jintao Li
Juan Cao
Tong-Yee Lee
DiffM
208
2
0
26 May 2025
From What to How: Attributing CLIP's Latent Components Reveals Unexpected Semantic Reliance
From What to How: Attributing CLIP's Latent Components Reveals Unexpected Semantic Reliance
Maximilian Dreyer
Lorenz Hufe
J. Berend
Thomas Wiegand
Sebastian Lapuschkin
Wojciech Samek
262
2
0
26 May 2025
Debiasing CLIP: Interpreting and Correcting Bias in Attention Heads
Debiasing CLIP: Interpreting and Correcting Bias in Attention Heads
Wei Jie Yeo
Rui Mao
Moloud Abdar
Erik Cambria
Frank Xing
286
3
0
23 May 2025
Multimodal Conditional Information Bottleneck for Generalizable AI-Generated Image Detection
Multimodal Conditional Information Bottleneck for Generalizable AI-Generated Image Detection
Haotian Qin
Dongliang Chang
Y. Gao
Bingyao Yu
Lei Chen
Zhanyu Ma
315
1
0
21 May 2025
Task Reconstruction and Extrapolation for $π_0$ using Text Latent
Task Reconstruction and Extrapolation for π0π_0π0​ using Text Latent
Quanyi Li
642
2
0
06 May 2025
123
Next