Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2411.10950
Cited By
v1
v2 (latest)
Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering
17 November 2024
Zeping Yu
Sophia Ananiadou
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering"
43 / 43 papers shown
Title
Same Task, Different Circuits: Disentangling Modality-Specific Mechanisms in VLMs
Yaniv Nikankin
Dana Arad
Yossi Gandelsman
Yonatan Belinkov
30
0
0
10 Jun 2025
Interpreting Arithmetic Mechanism in Large Language Models through Comparative Neuron Analysis
Zeping Yu
Sophia Ananiadou
LRM
MILM
109
14
0
21 Sep 2024
Understanding Information Storage and Transfer in Multi-modal Large Language Models
Samyadeep Basu
Martin Grayson
C. Morrison
Besmira Nushi
Soheil Feizi
Daniela Massiceti
93
12
0
06 Jun 2024
Hallucination of Multimodal Large Language Models: A Survey
Zechen Bai
Pichao Wang
Tianjun Xiao
Tong He
Zongbo Han
Zheng Zhang
Mike Zheng Shou
VLM
LRM
236
197
0
29 Apr 2024
Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking
Nikhil Prakash
Tamar Rott Shaham
Tal Haklay
Yonatan Belinkov
David Bau
96
67
0
22 Feb 2024
Visual Hallucinations of Multi-modal Large Language Models
Wen Huang
Hongbin Liu
Minxin Guo
Neil Zhenqiang Gong
MLLM
VLM
113
35
0
22 Feb 2024
How do Large Language Models Learn In-Context? Query and Key Matrices of In-Context Heads are Two Towers for Metric Learning
Zeping Yu
Sophia Ananiadou
102
12
0
05 Feb 2024
A Survey on Hallucination in Large Vision-Language Models
Hanchao Liu
Wenyuan Xue
Yifei Chen
Dapeng Chen
Xiutian Zhao
Ke Wang
Liping Hou
Rong-Zhi Li
Wei Peng
LRM
MLLM
85
137
0
01 Feb 2024
MM-LLMs: Recent Advances in MultiModal Large Language Models
Duzhen Zhang
Yahan Yu
Jiahua Dong
Chenxing Li
Dan Su
Chenhui Chu
Dong Yu
OffRL
LRM
161
213
0
24 Jan 2024
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Shengbang Tong
Zhuang Liu
Yuexiang Zhai
Yi-An Ma
Yann LeCun
Saining Xie
VLM
MLLM
130
349
0
11 Jan 2024
Neuron-Level Knowledge Attribution in Large Language Models
Zeping Yu
Sophia Ananiadou
FAtt
KELM
92
11
0
19 Dec 2023
Successor Heads: Recurring, Interpretable Attention Heads In The Wild
Rhys Gould
Euan Ong
George Ogden
Arthur Conmy
LRM
39
52
0
14 Dec 2023
Is This the Subspace You Are Looking for? An Interpretability Illusion for Subspace Activation Patching
Aleksandar Makelov
Georg Lange
Neel Nanda
51
22
0
28 Nov 2023
Causal Interpretation of Self-Attention in Pre-Trained Transformers
R. Y. Rohekar
Yaniv Gurwicz
Shami Nisimov
MILM
58
19
0
31 Oct 2023
Circuit Component Reuse Across Tasks in Transformer Language Models
Jack Merullo
Carsten Eickhoff
Ellie Pavlick
84
71
0
12 Oct 2023
Interpreting CLIP's Image Representation via Text-Based Decomposition
Yossi Gandelsman
Alexei A. Efros
Jacob Steinhardt
VLM
78
101
0
09 Oct 2023
LLM Lies: Hallucinations are not Bugs, but Features as Adversarial Examples
Jia-Yu Yao
Kun-Peng Ning
Zhen-Hui Liu
Munan Ning
Li Yuan
HILM
LRM
AAML
88
194
0
02 Oct 2023
Analyzing and Mitigating Object Hallucination in Large Vision-Language Models
Yiyang Zhou
Chenhang Cui
Jaehong Yoon
Linjun Zhang
Zhun Deng
Chelsea Finn
Mohit Bansal
Huaxiu Yao
MLLM
164
186
0
01 Oct 2023
Towards Best Practices of Activation Patching in Language Models: Metrics and Methods
Fred Zhang
Neel Nanda
LLMSV
205
115
0
27 Sep 2023
Gender bias and stereotypes in Large Language Models
Hadas Kotek
Rikker Dockum
David Q. Sun
123
245
0
28 Aug 2023
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Tom Lieberum
Matthew Rahtz
János Kramár
Neel Nanda
G. Irving
Rohin Shah
Vladimir Mikulik
103
115
0
18 Jul 2023
A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis
Alessandro Stolfo
Yonatan Belinkov
Mrinmaya Sachan
MILM
KELM
LRM
101
54
0
24 May 2023
Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning
Lean Wang
Lei Li
Damai Dai
Deli Chen
Hao Zhou
Fandong Meng
Jie Zhou
Xu Sun
177
193
0
23 May 2023
Evaluating Object Hallucination in Large Vision-Language Models
Yifan Li
Yifan Du
Kun Zhou
Jinpeng Wang
Wayne Xin Zhao
Ji-Rong Wen
MLLM
LRM
322
815
0
17 May 2023
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Wes Gurnee
Neel Nanda
Matthew Pauly
Katherine Harvey
Dmitrii Troitskii
Dimitris Bertsimas
MILM
276
218
0
02 May 2023
How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model
Michael Hanna
Ollie Liu
Alexandre Variengien
LRM
304
132
0
30 Apr 2023
Towards Automated Circuit Discovery for Mechanistic Interpretability
Arthur Conmy
Augustine N. Mavor-Parker
Aengus Lynch
Stefan Heimersheim
Adrià Garriga-Alonso
68
319
0
28 Apr 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language Models
Mor Geva
Jasmijn Bastings
Katja Filippova
Amir Globerson
KELM
287
324
0
28 Apr 2023
Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDa
VLM
MLLM
577
4,936
0
17 Apr 2023
Can ChatGPT Replace Traditional KBQA Models? An In-depth Analysis of the Question Answering Performance of the GPT LLM Family
Yiming Tan
Dehai Min
Y. Li
Wenbo Li
Nan Hu
Yongrui Chen
Guilin Qi
AI4MH
ELM
100
102
0
14 Mar 2023
Larger language models do in-context learning differently
Jerry W. Wei
Jason W. Wei
Yi Tay
Dustin Tran
Albert Webson
...
Xinyun Chen
Hanxiao Liu
Da Huang
Denny Zhou
Tengyu Ma
ReLM
LRM
122
374
0
07 Mar 2023
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
1.6K
13,506
0
27 Feb 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
320
563
0
01 Nov 2022
Analyzing Transformers in Embedding Space
Guy Dar
Mor Geva
Ankit Gupta
Jonathan Berant
83
93
0
06 Sep 2022
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space
Mor Geva
Avi Caciularu
Ke Wang
Yoav Goldberg
KELM
136
389
0
28 Mar 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
907
13,240
0
04 Mar 2022
Locating and Editing Factual Associations in GPT
Kevin Meng
David Bau
A. Andonian
Yonatan Belinkov
KELM
255
1,392
0
10 Feb 2022
Neuron-level Interpretation of Deep NLP Models: A Survey
Hassan Sajjad
Nadir Durrani
Fahim Dalvi
MILM
AI4CE
106
85
0
30 Aug 2021
Knowledge Neurons in Pretrained Transformers
Damai Dai
Li Dong
Y. Hao
Zhifang Sui
Baobao Chang
Furu Wei
KELM
MU
128
465
0
18 Apr 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
1.0K
29,964
0
26 Feb 2021
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
Samuel Gehman
Suchin Gururangan
Maarten Sap
Yejin Choi
Noah A. Smith
178
1,222
0
24 Sep 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
932
42,569
0
28 May 2020
Microsoft COCO: Common Objects in Context
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
449
43,907
0
01 May 2014
1