Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2206.07835
Cited By
Disentangling visual and written concepts in CLIP
15 June 2022
Joanna Materzyñska
Antonio Torralba
David Bau
CoGe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Disentangling visual and written concepts in CLIP"
38 / 38 papers shown
Title
Transformation of audio embeddings into interpretable, concept-based representations
Alice Zhang
Edison Thomaz
Lie Lu
27
0
0
18 Apr 2025
Steering CLIP's vision transformer with sparse autoencoders
Sonia Joseph
Praneet Suresh
Ethan Goldfarb
Lorenz Hufe
Yossi Gandelsman
Robert Graham
Danilo Bzdok
Wojciech Samek
Blake A. Richards
49
2
0
11 Apr 2025
SCAM: A Real-World Typographic Robustness Evaluation for Multimodal Foundation Models
Justus Westerhoff
Erblina Purellku
Jakob Hackstein
Jonas Loos
Leo Pinetzki
Lorenz Hufe
AAML
28
0
0
07 Apr 2025
DIFFER: Disentangling Identity Features via Semantic Cues for Clothes-Changing Person Re-ID
Xin Liang
Yogesh S Rawat
83
0
0
28 Mar 2025
Zero-Shot Visual Concept Blending Without Text Guidance
Hiroya Makino
Takahiro Yamaguchi
Hiroyuki Sakai
DiffM
38
0
0
27 Mar 2025
LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing
Achint Soni
Meet Soni
Sirisha Rambhatla
DiffM
51
0
0
27 Mar 2025
Web Artifact Attacks Disrupt Vision Language Models
Maan Qraitem
Piotr Teterwak
Kate Saenko
Bryan A. Plummer
AAML
73
0
0
17 Mar 2025
Hyperbolic Safety-Aware Vision-Language Models
Tobia Poppi
Tejaswi Kasarla
Pascal Mettes
Lorenzo Baraldi
Rita Cucchiara
VLM
MU
54
0
0
15 Mar 2025
New Emerged Security and Privacy of Pre-trained Model: a Survey and Outlook
Meng Yang
Tianqing Zhu
Chi Liu
Wanlei Zhou
Shui Yu
Philip S. Yu
AAML
ELM
PILM
53
1
0
12 Nov 2024
Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training
Sara Sarto
Nicholas Moratelli
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
23
3
0
09 Oct 2024
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling
William Y. Zhu
Keren Ye
Junjie Ke
Jiahui Yu
Leonidas J. Guibas
P. Milanfar
Feng Yang
43
2
0
07 Aug 2024
DisCoM-KD: Cross-Modal Knowledge Distillation via Disentanglement Representation and Adversarial Learning
Dino Ienco
C. Dantas
25
1
0
05 Aug 2024
Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text Retrieval
Gangyan Zeng
Yuan Zhang
Jin Wei
Dongbao Yang
Peng Zhang
Yiwen Gao
Xugong Qin
Yu Zhou
VLM
CLIP
13
0
0
01 Aug 2024
Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities
Sachit Menon
Richard Zemel
Carl Vondrick
LRM
30
1
0
20 Jun 2024
Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Models
Simon Schrodi
David T. Hoffmann
Max Argus
Volker Fischer
Thomas Brox
VLM
50
0
0
11 Apr 2024
ASAP: Interpretable Analysis and Summarization of AI-generated Image Patterns at Scale
Jinbin Huang
C. L. P. Chen
Aditi Mishra
Bum Chul Kwon
Zhicheng Liu
Chris Bryan
37
4
0
03 Apr 2024
Scene Depth Estimation from Traditional Oriental Landscape Paintings
Sungho Kang
Yeonghyeon Park
H. Park
Juneho Yi
30
0
0
06 Mar 2024
Closed-Loop Unsupervised Representation Disentanglement with
β
β
β
-VAE Distillation and Diffusion Probabilistic Feedback
Xin Jin
Bo Li
Baao Xie
Wenyao Zhang
Jinming Liu
Ziqiang Li
Tao Yang
Wenjun Zeng
DRL
DiffM
CoGe
27
7
0
04 Feb 2024
Parrot Captions Teach CLIP to Spot Text
Yiqi Lin
Conghui He
Alex Jinpeng Wang
Bin Wang
Weijia Li
Mike Zheng Shou
20
7
0
21 Dec 2023
Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models
Samuele Poppi
Tobia Poppi
Federico Cocchi
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
VLM
19
8
0
27 Nov 2023
Kiki or Bouba? Sound Symbolism in Vision-and-Language Models
Morris Alper
Hadar Averbuch-Elor
28
10
0
25 Oct 2023
Interpreting CLIP's Image Representation via Text-Based Decomposition
Yossi Gandelsman
Alexei A. Efros
Jacob Steinhardt
VLM
16
80
0
09 Oct 2023
Rigorously Assessing Natural Language Explanations of Neurons
Jing-ling Huang
Atticus Geiger
Karel DÓosterlinck
Zhengxuan Wu
Christopher Potts
MILM
8
25
0
19 Sep 2023
In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval
Nina Shvetsova
Anna Kukleva
Bernt Schiele
Hilde Kuehne
DiffM
23
3
0
16 Sep 2023
Parts of Speech-Grounded Subspaces in Vision-Language Models
James Oldfield
Christos Tzelepis
Yannis Panagakis
M. Nicolaou
Ioannis Patras
19
9
0
23 May 2023
What does CLIP know about a red circle? Visual prompt engineering for VLMs
Aleksandar Shtedritski
Christian Rupprecht
Andrea Vedaldi
VLM
MLLM
21
137
0
13 Apr 2023
Defense-Prefix for Preventing Typographic Attacks on CLIP
Hiroki Azuma
Yusuke Matsui
VLM
AAML
18
16
0
10 Apr 2023
Zero-shot Model Diagnosis
Jinqi Luo
Zhaoning Wang
Chen Henry Wu
Dong Huang
Fernando De la Torre
VLM
11
20
0
27 Mar 2023
Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
Sara Sarto
Manuele Barraco
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
13
55
0
21 Mar 2023
SpectralCLIP: Preventing Artifacts in Text-Guided Style Transfer from a Spectral Perspective
Zipeng Xu
Songlong Xing
E. Sangineto
N. Sebe
CLIP
17
2
0
16 Mar 2023
Teaching CLIP to Count to Ten
Roni Paiss
Ariel Ephrat
Omer Tov
Shiran Zada
Inbar Mosseri
Michal Irani
Tali Dekel
VLM
CLIP
22
88
0
23 Feb 2023
CLIPPO: Image-and-Language Understanding from Pixels Only
Michael Tschannen
Basil Mustafa
N. Houlsby
CLIP
VLM
19
47
0
15 Dec 2022
CREPE: Can Vision-Language Foundation Models Reason Compositionally?
Zixian Ma
Jerry Hong
Mustafa Omer Gul
Mona Gandhi
Irena Gao
Ranjay Krishna
CoGe
18
124
0
13 Dec 2022
Task Bias in Vision-Language Models
Sachit Menon
I. Chandratreya
Carl Vondrick
VLM
SSL
12
6
0
08 Dec 2022
Disentangled Representation Learning
Xin Eric Wang
Hong Chen
Siao Tang
Zihao Wu
Wenwu Zhu
DRL
19
77
0
21 Nov 2022
What the DAAM: Interpreting Stable Diffusion Using Cross Attention
Raphael Tang
Linqing Liu
Akshat Pandey
Zhiying Jiang
Gefei Yang
K. Kumar
Pontus Stenetorp
Jimmy J. Lin
Ferhan Ture
8
162
0
10 Oct 2022
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,735
0
24 Feb 2021
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
279
39,083
0
01 Sep 2014
1