Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.14394
Cited By
A Multimodal Automated Interpretability Agent
22 April 2024
Tamar Rott Shaham
Sarah Schwettmann
Franklin Wang
Achyuta Rajaram
Evan Hernandez
Jacob Andreas
Antonio Torralba
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Multimodal Automated Interpretability Agent"
19 / 19 papers shown
Title
Classifier-to-Bias: Toward Unsupervised Automatic Bias Detection for Visual Classifiers
Quentin Guimard
Moreno DÍncà
Massimiliano Mancini
Elisa Ricci
SSL
72
0
0
29 Apr 2025
AUTHENTICATION: Identifying Rare Failure Modes in Autonomous Vehicle Perception Systems using Adversarially Guided Diffusion Models
Mohammad Zarei
Melanie A Jutras
Eliana Evans
Mike Tan
Omid Aaramoon
AAML
DiffM
40
0
0
24 Apr 2025
Interpreting the Linear Structure of Vision-language Model Embedding Spaces
Isabel Papadimitriou
Huangyuan Su
Thomas Fel
Naomi Saphra
Sham Kakade
Stephanie Gil
VLM
40
0
0
16 Apr 2025
Uncovering Branch specialization in InceptionV1 using k sparse autoencoders
Matthew Bozoukov
14
0
0
14 Apr 2025
Deceptive Automated Interpretability: Language Models Coordinating to Fool Oversight Systems
Simon Lermen
Mateusz Dziemian
Natalia Pérez-Campanero Antolín
31
0
0
10 Apr 2025
HyperDAS: Towards Automating Mechanistic Interpretability with Hypernetworks
Jiuding Sun
Jing Huang
Sidharth Baskaran
Karel DÓosterlinck
Christopher Potts
Michael Sklar
Atticus Geiger
AI4CE
57
0
0
13 Mar 2025
TinyEmo: Scaling down Emotional Reasoning via Metric Projection
Cristian Gutierrez
LRM
62
0
0
17 Feb 2025
Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey
Yunkai Dang
Kaichen Huang
Jiahao Huo
Yibo Yan
S. Huang
...
Kun Wang
Yong Liu
Jing Shao
Hui Xiong
Xuming Hu
LRM
96
14
0
03 Dec 2024
Meta-Models: An Architecture for Decoding LLM Behaviors Through Interpreted Embeddings and Natural Language
Anthony Costarelli
Mat Allen
Severin Field
16
1
0
03 Oct 2024
Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations
Nick Jiang
Anish Kachinthaya
Suzie Petryk
Yossi Gandelsman
VLM
32
14
0
03 Oct 2024
The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability
Aaron Mueller
Jannik Brinkmann
Millicent Li
Samuel Marks
Koyena Pal
...
Arnab Sen Sharma
Jiuding Sun
Eric Todd
David Bau
Yonatan Belinkov
CML
40
18
0
02 Aug 2024
InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques
Rohan Gupta
Iván Arcuschin
Thomas Kwa
Adrià Garriga-Alonso
40
3
0
19 Jul 2024
Automatic benchmarking of large multimodal models via iterative experiment programming
Alessandro Conti
Enrico Fini
Paolo Rota
Yiming Wang
Massimiliano Mancini
Elisa Ricci
30
0
0
18 Jun 2024
Contrastive Sparse Autoencoders for Interpreting Planning of Chess-Playing Agents
Yoann Poupart
23
0
0
06 Jun 2024
Image-of-Thought Prompting for Visual Reasoning Refinement in Multimodal Large Language Models
Qiji Zhou
Ruochen Zhou
Zike Hu
Panzhong Lu
Siyang Gao
Yue Zhang
LRM
38
12
0
22 May 2024
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Wes Gurnee
Neel Nanda
Matthew Pauly
Katherine Harvey
Dmitrii Troitskii
Dimitris Bertsimas
MILM
153
186
0
02 May 2023
Spawrious: A Benchmark for Fine Control of Spurious Correlation Biases
Aengus Lynch
G. Dovonon
Jean Kaddour
Ricardo M. A. Silva
184
30
0
09 Mar 2023
Natural Language Descriptions of Deep Visual Features
Evan Hernandez
Sarah Schwettmann
David Bau
Teona Bagashvili
Antonio Torralba
Jacob Andreas
MILM
194
116
0
26 Jan 2022
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
283
5,723
0
29 Apr 2021
1