Disentangling visual and written concepts in CLIP

15 June 2022

Antonio Torralba

Papers citing "Disentangling visual and written concepts in CLIP"

38 / 38 papers shown

Title
Transformation of audio embeddings into interpretable, concept-based representations Alice Zhang Edison Thomaz Lie Lu 27 0 0 18 Apr 2025
Steering CLIP's vision transformer with sparse autoencoders Sonia Joseph Praneet Suresh Ethan Goldfarb Lorenz Hufe Yossi Gandelsman Robert Graham Danilo Bzdok Wojciech Samek Blake A. Richards 51 2 0 11 Apr 2025
SCAM: A Real-World Typographic Robustness Evaluation for Multimodal Foundation Models Justus Westerhoff Erblina Purellku Jakob Hackstein Jonas Loos Leo Pinetzki Lorenz Hufe AAML 28 0 0 07 Apr 2025
DIFFER: Disentangling Identity Features via Semantic Cues for Clothes-Changing Person Re-ID Xin Liang Yogesh S Rawat 83 0 0 28 Mar 2025
Zero-Shot Visual Concept Blending Without Text Guidance Hiroya Makino Takahiro Yamaguchi Hiroyuki Sakai DiffM 40 0 0 27 Mar 2025
LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing Achint Soni Meet Soni Sirisha Rambhatla DiffM 54 0 0 27 Mar 2025
Web Artifact Attacks Disrupt Vision Language Models Maan Qraitem Piotr Teterwak Kate Saenko Bryan A. Plummer AAML 73 0 0 17 Mar 2025
Hyperbolic Safety-Aware Vision-Language Models Tobia Poppi Tejaswi Kasarla Pascal Mettes Lorenzo Baraldi Rita Cucchiara VLM MU 56 0 0 15 Mar 2025
New Emerged Security and Privacy of Pre-trained Model: a Survey and Outlook Meng Yang Tianqing Zhu Chi Liu Wanlei Zhou Shui Yu Philip S. Yu AAML ELM PILM 56 1 0 12 Nov 2024
Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training Sara Sarto Nicholas Moratelli Marcella Cornia Lorenzo Baraldi Rita Cucchiara 23 3 0 09 Oct 2024
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling William Y. Zhu Keren Ye Junjie Ke Jiahui Yu Leonidas J. Guibas P. Milanfar Feng Yang 43 2 0 07 Aug 2024
DisCoM-KD: Cross-Modal Knowledge Distillation via Disentanglement Representation and Adversarial Learning Dino Ienco C. Dantas 25 1 0 05 Aug 2024
Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text Retrieval Gangyan Zeng Yuan Zhang Jin Wei Dongbao Yang Peng Zhang Yiwen Gao Xugong Qin Yu Zhou VLM CLIP 13 0 0 01 Aug 2024
Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities Sachit Menon Richard Zemel Carl Vondrick LRM 33 1 0 20 Jun 2024
Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Models Simon Schrodi David T. Hoffmann Max Argus Volker Fischer Thomas Brox VLM 50 0 0 11 Apr 2024
ASAP: Interpretable Analysis and Summarization of AI-generated Image Patterns at Scale Jinbin Huang C. L. P. Chen Aditi Mishra Bum Chul Kwon Zhicheng Liu Chris Bryan 37 4 0 03 Apr 2024
Scene Depth Estimation from Traditional Oriental Landscape Paintings Sungho Kang Yeonghyeon Park H. Park Juneho Yi 30 0 0 06 Mar 2024
Closed-Loop Unsupervised Representation Disentanglement with $β$ -VAE Distillation and Diffusion Probabilistic Feedback Xin Jin Bo Li Baao Xie Wenyao Zhang Jinming Liu Ziqiang Li Tao Yang Wenjun Zeng DRL DiffM CoGe 27 7 0 04 Feb 2024
Parrot Captions Teach CLIP to Spot Text Yiqi Lin Conghui He Alex Jinpeng Wang Bin Wang Weijia Li Mike Zheng Shou 20 7 0 21 Dec 2023
Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models Samuele Poppi Tobia Poppi Federico Cocchi Marcella Cornia Lorenzo Baraldi Rita Cucchiara VLM 19 8 0 27 Nov 2023
Kiki or Bouba? Sound Symbolism in Vision-and-Language Models Morris Alper Hadar Averbuch-Elor 28 10 0 25 Oct 2023
Interpreting CLIP's Image Representation via Text-Based Decomposition Yossi Gandelsman Alexei A. Efros Jacob Steinhardt VLM 16 80 0 09 Oct 2023
Rigorously Assessing Natural Language Explanations of Neurons Jing-ling Huang Atticus Geiger Karel DÓosterlinck Zhengxuan Wu Christopher Potts MILM 10 25 0 19 Sep 2023
In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval Nina Shvetsova Anna Kukleva Bernt Schiele Hilde Kuehne DiffM 23 3 0 16 Sep 2023
Parts of Speech-Grounded Subspaces in Vision-Language Models James Oldfield Christos Tzelepis Yannis Panagakis M. Nicolaou Ioannis Patras 19 9 0 23 May 2023
What does CLIP know about a red circle? Visual prompt engineering for VLMs Aleksandar Shtedritski Christian Rupprecht Andrea Vedaldi VLM MLLM 21 140 0 13 Apr 2023
Defense-Prefix for Preventing Typographic Attacks on CLIP Hiroki Azuma Yusuke Matsui VLM AAML 18 16 0 10 Apr 2023
Zero-shot Model Diagnosis Jinqi Luo Zhaoning Wang Chen Henry Wu Dong Huang Fernando De la Torre VLM 13 20 0 27 Mar 2023
Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation Sara Sarto Manuele Barraco Marcella Cornia Lorenzo Baraldi Rita Cucchiara 13 55 0 21 Mar 2023
SpectralCLIP: Preventing Artifacts in Text-Guided Style Transfer from a Spectral Perspective Zipeng Xu Songlong Xing E. Sangineto N. Sebe CLIP 17 2 0 16 Mar 2023
Teaching CLIP to Count to Ten Roni Paiss Ariel Ephrat Omer Tov Shiran Zada Inbar Mosseri Michal Irani Tali Dekel VLM CLIP 25 88 0 23 Feb 2023
CLIPPO: Image-and-Language Understanding from Pixels Only Michael Tschannen Basil Mustafa N. Houlsby CLIP VLM 19 47 0 15 Dec 2022
CREPE: Can Vision-Language Foundation Models Reason Compositionally? Zixian Ma Jerry Hong Mustafa Omer Gul Mona Gandhi Irena Gao Ranjay Krishna CoGe 18 125 0 13 Dec 2022
Task Bias in Vision-Language Models Sachit Menon I. Chandratreya Carl Vondrick VLM SSL 12 6 0 08 Dec 2022
Disentangled Representation Learning Xin Eric Wang Hong Chen Siao Tang Zihao Wu Wenwu Zhu DRL 19 77 0 21 Nov 2022
What the DAAM: Interpreting Stable Diffusion Using Cross Attention Raphael Tang Linqing Liu Akshat Pandey Zhiying Jiang Gefei Yang K. Kumar Pontus Stenetorp Jimmy J. Lin Ferhan Ture 10 167 0 10 Oct 2022
Zero-Shot Text-to-Image Generation Aditya A. Ramesh Mikhail Pavlov Gabriel Goh Scott Gray Chelsea Voss Alec Radford Mark Chen Ilya Sutskever VLM 253 4,735 0 24 Feb 2021
ImageNet Large Scale Visual Recognition Challenge Olga Russakovsky Jia Deng Hao Su J. Krause S. Satheesh ... A. Karpathy A. Khosla Michael S. Bernstein Alexander C. Berg Li Fei-Fei VLM ObjD 282 39,170 0 01 Sep 2014