ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.05916
  4. Cited By
Interpreting CLIP's Image Representation via Text-Based Decomposition
v1v2v3v4 (latest)

Interpreting CLIP's Image Representation via Text-Based Decomposition

International Conference on Learning Representations (ICLR), 2023
9 October 2023
Yossi Gandelsman
Alexei A. Efros
Jacob Steinhardt
    VLM
ArXiv (abs)PDFHTMLHuggingFace (2 upvotes)

Papers citing "Interpreting CLIP's Image Representation via Text-Based Decomposition"

50 / 122 papers shown
Multimodal Language Models See Better When They Look Shallower
Multimodal Language Models See Better When They Look Shallower
Wei Xu
Junyan Lin
Xinhao Chen
Yue Fan
Jianfeng Dong
Hui Su
Jianfeng Dong
Jinlan Fu
Xiaoyu Shen
VLM
356
4
0
30 Apr 2025
Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video
Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video
Sonia Joseph
Praneet Suresh
Lorenz Hufe
Edward Stevinson
Robert Graham
Yash Vadi
Danilo Bzdok
Sebastian Lapuschkin
Lee Sharkey
Blake A. Richards
546
10
0
28 Apr 2025
Decoding Vision Transformers: the Diffusion Steering Lens
Decoding Vision Transformers: the Diffusion Steering Lens
Ryota Takatsuki
Sonia Joseph
Ippei Fujisawa
Ryota Kanai
DiffM
385
0
0
18 Apr 2025
Transformation of audio embeddings into interpretable, concept-based representations
Transformation of audio embeddings into interpretable, concept-based representations
Alice Zhang
Edison Thomaz
Lie Lu
221
0
0
18 Apr 2025
Understanding Attention Mechanism in Video Diffusion Models
Understanding Attention Mechanism in Video Diffusion Models
Bingyan Liu
Chengyu Wang
Tongtong Su
Huan Ten
Jun Huang
K. Guo
Kui Jia
VGen
342
2
0
16 Apr 2025
Steering CLIP's vision transformer with sparse autoencoders
Steering CLIP's vision transformer with sparse autoencoders
Sonia Joseph
Praneet Suresh
Ethan Goldfarb
Lorenz Hufe
Yossi Gandelsman
Robert Graham
Danilo Bzdok
Wojciech Samek
Blake A. Richards
286
13
0
11 Apr 2025
MASS: MoErging through Adaptive Subspace Selection
MASS: MoErging through Adaptive Subspace Selection
Donato Crisostomi
Alessandro Zirilli
Antonio Andrea Gargiulo
Maria Sofia Bucarelli
Simone Scardapane
Fabrizio Silvestri
Iacopo Masi
Emanuele Rodolà
MoMe
293
0
0
06 Apr 2025
Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models
Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models
Mateusz Pach
Shyamgopal Karthik
Quentin Bouniot
Serge Belongie
Zeynep Akata
VLM
561
12
0
03 Apr 2025
Embedding Shift Dissection on CLIP: Effects of Augmentations on VLM's Representation Learning
Embedding Shift Dissection on CLIP: Effects of Augmentations on VLM's Representation Learning
Ashim Dahal
Saydul Akbar Murad
Nick Rahimi
VLM
354
2
0
30 Mar 2025
Attribute-formed Class-specific Concept Space: Endowing Language Bottleneck Model with Better Interpretability and Scalability
Attribute-formed Class-specific Concept Space: Endowing Language Bottleneck Model with Better Interpretability and ScalabilityComputer Vision and Pattern Recognition (CVPR), 2025
Jianyang Zhang
Qianli Luo
Guowu Yang
Wenjing Yang
Weide Liu
Guosheng Lin
Fengmao Lv
297
0
0
26 Mar 2025
fine-CLIP: Enhancing Zero-Shot Fine-Grained Surgical Action Recognition with Vision-Language Models
fine-CLIP: Enhancing Zero-Shot Fine-Grained Surgical Action Recognition with Vision-Language Models
Saurav Sharma
Didier Mutter
N. Padoy
VLMMedIm
252
0
0
25 Mar 2025
An Iterative Feedback Mechanism for Improving Natural Language Class Descriptions in Open-Vocabulary Object Detection
An Iterative Feedback Mechanism for Improving Natural Language Class Descriptions in Open-Vocabulary Object Detection
Louis Y. Kim
Michelle Karker
Victoria Valledor
Seiyoung C. Lee
Karl F. Brzoska
Margaret Duff
Anthony Palladino
VLMObjD
227
1
0
21 Mar 2025
Representational Similarity via Interpretable Visual Concepts
Representational Similarity via Interpretable Visual ConceptsInternational Conference on Learning Representations (ICLR), 2025
Neehar Kondapaneni
Oisin Mac Aodha
Pietro Perona
DRL
985
3
0
19 Mar 2025
CoE: Chain-of-Explanation via Automatic Visual Concept Circuit Description and Polysemanticity Quantification
CoE: Chain-of-Explanation via Automatic Visual Concept Circuit Description and Polysemanticity QuantificationComputer Vision and Pattern Recognition (CVPR), 2025
Wenlong Yu
Qilong Wang
Chuang Liu
Dong Li
Q. Hu
LRM
306
2
0
19 Mar 2025
Backdooring CLIP through Concept Confusion
Backdooring CLIP through Concept Confusion
Lijie Hu
Junchi Liao
Weimin Lyu
Shaopeng Fu
Tianhao Huang
Shu Yang
Guimin Hu
Di Wang
AAML
332
1
0
12 Mar 2025
Interpreting CLIP with Hierarchical Sparse Autoencoders
Interpreting CLIP with Hierarchical Sparse Autoencoders
Vladimir Zaigrajew
Hubert Baniecki
P. Biecek
491
15
0
27 Feb 2025
The Role of Background Information in Reducing Object Hallucination in Vision-Language Models: Insights from Cutoff API Prompting
The Role of Background Information in Reducing Object Hallucination in Vision-Language Models: Insights from Cutoff API Prompting
Masayo Tomita
Katsuhiko Hayashi
Tomoyuki Kaneko
VLM
191
0
0
24 Feb 2025
Narrowing Information Bottleneck Theory for Multimodal Image-Text Representations Interpretability
Narrowing Information Bottleneck Theory for Multimodal Image-Text Representations InterpretabilityInternational Conference on Learning Representations (ICLR), 2025
Zhiyu Zhu
Zhibo Jin
Jiayu Zhang
Nan Yang
Jiahao Huang
Jianlong Zhou
Fang Chen
269
3
0
16 Feb 2025
Trustworthy AI: Safety, Bias, and Privacy -- A Survey
Trustworthy AI: Safety, Bias, and Privacy -- A Survey
Xingli Fang
Jianwei Li
Varun Mulchandani
Jung-Eun Kim
379
0
0
11 Feb 2025
Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models
Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Michael Toker
Ido Galil
Hadas Orgad
Rinon Gal
Yoad Tewel
Gal Chechik
Yonatan Belinkov
DiffM
242
8
0
12 Jan 2025
Visual Large Language Models for Generalized and Specialized Applications
Jiayi Zhang
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
465
33
0
06 Jan 2025
A Review of Multimodal Explainable Artificial Intelligence: Past,
  Present and Future
A Review of Multimodal Explainable Artificial Intelligence: Past, Present and Future
Shilin Sun
Wenbin An
Feng Tian
Fang Nan
Qidong Liu
Jing Liu
N. Shah
Ping Chen
389
20
0
18 Dec 2024
Attention Head Purification: A New Perspective to Harness CLIP for
  Domain Generalization
Attention Head Purification: A New Perspective to Harness CLIP for Domain Generalization
Yingfan Wang
Guoliang Kang
VLM
385
3
0
10 Dec 2024
Language Model as Visual Explainer
Language Model as Visual ExplainerNeural Information Processing Systems (NeurIPS), 2024
Xingyi Yang
Xinchao Wang
VLM
209
1
0
08 Dec 2024
Explainable and Interpretable Multimodal Large Language Models: A
  Comprehensive Survey
Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey
Yunkai Dang
Kaichen Huang
Jiahao Huo
Yibo Yan
Shijie Huang
...
Kun Wang
Yong Liu
Jing Shao
Hui Xiong
Xuming Hu
LRM
430
52
0
03 Dec 2024
Interpreting Object-level Foundation Models via Visual Precision Search
Interpreting Object-level Foundation Models via Visual Precision SearchComputer Vision and Pattern Recognition (CVPR), 2024
Ruoyu Chen
Yaning Tan
Jingzhi Li
Shiming Liu
Maosen Li
Zheng Huang
Qichuan Geng
Xiaochun Cao
FAtt
580
16
0
25 Nov 2024
Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention Lens
Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention LensComputer Vision and Pattern Recognition (CVPR), 2024
Zhangqi Jiang
Junkai Chen
Beier Zhu
Tingjin Luo
Yankun Shen
Xu Yang
532
53
0
23 Nov 2024
ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements
ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements
M. Arda Aydın
Efe Mert Çırpar
Elvin Abdinli
Gözde B. Ünal
Y. Sahin
VLM
645
3
0
18 Nov 2024
Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering
Zeping Yu
Sophia Ananiadou
1.1K
9
0
17 Nov 2024
Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting
  Rare Concepts in Foundation Models
Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Aashiq Muhamed
Mona Diab
Virginia Smith
248
8
0
01 Nov 2024
Beyond Accuracy: Ensuring Correct Predictions With Correct Rationales
Beyond Accuracy: Ensuring Correct Predictions With Correct RationalesNeural Information Processing Systems (NeurIPS), 2024
Tang Li
Mengmeng Ma
Xi Peng
387
3
0
31 Oct 2024
ResiDual Transformer Alignment with Spectral Decomposition
ResiDual Transformer Alignment with Spectral Decomposition
Lorenzo Basile
Valentino Maiorca
Luca Bortolussi
Emanuele Rodolà
Francesco Locatello
560
4
0
31 Oct 2024
Focus On This, Not That! Steering LLMs with Adaptive Feature Specification
Focus On This, Not That! Steering LLMs with Adaptive Feature Specification
Tom A. Lamb
Adam Davies
Alasdair Paren
Juil Sock
Francesco Pinto
548
4
0
30 Oct 2024
MoH: Multi-Head Attention as Mixture-of-Head Attention
MoH: Multi-Head Attention as Mixture-of-Head AttentionInternational Conference on Machine Learning (ICML), 2024
Peng Jin
Bo Zhu
Li Yuan
Shuicheng Yan
MoE
416
39
0
15 Oct 2024
Robust AI-Generated Text Detection by Restricted Embeddings
Robust AI-Generated Text Detection by Restricted EmbeddingsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Kristian Kuznetsov
Eduard Tulchinskii
Laida Kushnareva
German Magai
Serguei Barannikov
Sergey I. Nikolenko
Irina Piontkovskaya
DeLMO
184
15
0
10 Oct 2024
Towards Interpreting Visual Information Processing in Vision-Language Models
Towards Interpreting Visual Information Processing in Vision-Language ModelsInternational Conference on Learning Representations (ICLR), 2024
Philip Quirke
Luke Ong
Juil Sock
Mor Geva
David M. Krueger
Fazl Barez
536
49
0
09 Oct 2024
Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations
Interpreting and Editing Vision-Language Representations to Mitigate HallucinationsInternational Conference on Learning Representations (ICLR), 2024
Nick Jiang
Anish Kachinthaya
Suzie Petryk
Yossi Gandelsman
VLM
413
62
0
03 Oct 2024
Explanation Bottleneck Models
Explanation Bottleneck ModelsAAAI Conference on Artificial Intelligence (AAAI), 2024
Shinýa Yamaguchi
Kosuke Nishida
LRMBDL
379
4
0
26 Sep 2024
Attention Prompting on Image for Large Vision-Language Models
Attention Prompting on Image for Large Vision-Language ModelsEuropean Conference on Computer Vision (ECCV), 2024
Runpeng Yu
Weihao Yu
Xinchao Wang
VLM
398
28
0
25 Sep 2024
Quantifying and Enabling the Interpretability of CLIP-like Models
Quantifying and Enabling the Interpretability of CLIP-like Models
Avinash Madasu
Yossi Gandelsman
Vasudev Lal
Phillip Howard
VLM
224
3
0
10 Sep 2024
Graph-based Unsupervised Disentangled Representation Learning via
  Multimodal Large Language Models
Graph-based Unsupervised Disentangled Representation Learning via Multimodal Large Language Models
Baao Xie
Qiuyu Chen
Yunnan Wang
Zequn Zhang
Xin Jin
Wenjun Zeng
OffRL
265
7
0
26 Jul 2024
ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language
  Inference
ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference
Mengcheng Lan
Chaofeng Chen
Yiping Ke
Xinjiang Wang
Xue Jiang
Wayne Zhang
VLM
334
74
0
17 Jul 2024
Interpretability in Action: Exploratory Analysis of VPT, a Minecraft
  Agent
Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent
Karolis Jucys
George Adamopoulos
Mehrab Hamidi
Stephanie Milani
Mohammad Reza Samsami
Artem Zholus
Sonia Joseph
Blake A. Richards
Irina Rish
Özgür Simsek
302
4
0
16 Jul 2024
FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot
  Performance
FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot Performance
Jiedong Zhuang
Jiaqi Hu
Lianrui Mu
Rui Hu
Xiaoyu Liang
Jiangnan Ye
Haoji Hu
CLIPVLM
330
7
0
08 Jul 2024
AlignedCut: Visual Concepts Discovery on Brain-Guided Universal Feature
  Space
AlignedCut: Visual Concepts Discovery on Brain-Guided Universal Feature Space
Huzheng Yang
James Gee
Jianbo Shi
VOS
169
3
0
26 Jun 2024
Transcoders Find Interpretable LLM Feature Circuits
Transcoders Find Interpretable LLM Feature Circuits
Jacob Dunefsky
Philippe Chlenski
Neel Nanda
220
88
0
17 Jun 2024
MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in
  Multimodal Large Language Model
MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model
Jiahao Huo
Yibo Yan
Boren Hu
Yutao Yue
Xuming Hu
LRMMLLM
266
16
0
17 Jun 2024
Concept-skill Transferability-based Data Selection for Large
  Vision-Language Models
Concept-skill Transferability-based Data Selection for Large Vision-Language Models
Jaewoo Lee
Boyang Li
Sung Ju Hwang
VLM
298
20
0
16 Jun 2024
Fine-Grained Domain Generalization with Feature Structuralization
Fine-Grained Domain Generalization with Feature Structuralization
Wenlong Yu
Dongyue Chen
Qilong Wang
Qinghua Hu
357
0
0
13 Jun 2024
A Concept-Based Explainability Framework for Large Multimodal Models
A Concept-Based Explainability Framework for Large Multimodal Models
Jayneel Parekh
Pegah Khayatan
Mustafa Shukor
A. Newson
Matthieu Cord
270
33
0
12 Jun 2024
Previous
123
Next
Page 2 of 3