Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2411.10950
Cited By
v1
v2 (latest)
Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering
17 November 2024
Zeping Yu
Sophia Ananiadou
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering"
46 / 46 papers shown
Title
Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs
Minji Kim
Taekyung Kim
Bohyung Han
24
0
0
15 Oct 2025
Interpret, prune and distill Donut : towards lightweight VLMs for VQA on document
Adnan Ben Mansour
Ayoub Karine
D. Naccache
16
0
0
30 Sep 2025
REMA: A Unified Reasoning Manifold Framework for Interpreting Large Language Model
Bo Li
Guanzhi Deng
Ronghao Chen
Junrong Yue
Shuo Zhang
Qinghua Zhao
Linqi Song
Lijie Wen
LRM
44
0
0
26 Sep 2025
Same Task, Different Circuits: Disentangling Modality-Specific Mechanisms in VLMs
Yaniv Nikankin
Dana Arad
Yossi Gandelsman
Yonatan Belinkov
168
3
0
10 Jun 2025
Interpreting Arithmetic Mechanism in Large Language Models through Comparative Neuron Analysis
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Zeping Yu
Sophia Ananiadou
LRM
MILM
147
18
0
21 Sep 2024
Understanding Information Storage and Transfer in Multi-modal Large Language Models
Neural Information Processing Systems (NeurIPS), 2024
Samyadeep Basu
Martin Grayson
C. Morrison
Besmira Nushi
Soheil Feizi
Daniela Massiceti
165
28
0
06 Jun 2024
Hallucination of Multimodal Large Language Models: A Survey
Zechen Bai
Pichao Wang
Tianjun Xiao
Tong He
Zongbo Han
Zheng Zhang
Mike Zheng Shou
VLM
LRM
380
282
0
29 Apr 2024
Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking
Nikhil Prakash
Tamar Rott Shaham
Tal Haklay
Yonatan Belinkov
David Bau
158
89
0
22 Feb 2024
Visual Hallucinations of Multi-modal Large Language Models
Wen Huang
Hongbin Liu
Minxin Guo
Neil Zhenqiang Gong
MLLM
VLM
152
50
0
22 Feb 2024
How do Large Language Models Learn In-Context? Query and Key Matrices of In-Context Heads are Two Towers for Metric Learning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Zeping Yu
Sophia Ananiadou
149
15
0
05 Feb 2024
A Survey on Hallucination in Large Vision-Language Models
Hanchao Liu
Wenyuan Xue
Yifei Chen
Dapeng Chen
Xiutian Zhao
Ke Wang
Liping Hou
Rong-Zhi Li
Wei Peng
LRM
MLLM
153
208
0
01 Feb 2024
MM-LLMs: Recent Advances in MultiModal Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Duzhen Zhang
Yahan Yu
Jiahua Dong
Chenxing Li
Dan Su
Chenhui Chu
Dong Yu
OffRL
LRM
263
285
0
24 Jan 2024
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Computer Vision and Pattern Recognition (CVPR), 2024
Shengbang Tong
Zhuang Liu
Yuexiang Zhai
Yi-An Ma
Yann LeCun
Saining Xie
VLM
MLLM
300
485
0
11 Jan 2024
Neuron-Level Knowledge Attribution in Large Language Models
Zeping Yu
Sophia Ananiadou
FAtt
KELM
155
24
0
19 Dec 2023
Successor Heads: Recurring, Interpretable Attention Heads In The Wild
International Conference on Learning Representations (ICLR), 2023
Rhys Gould
Euan Ong
George Ogden
Arthur Conmy
LRM
118
60
0
14 Dec 2023
Is This the Subspace You Are Looking for? An Interpretability Illusion for Subspace Activation Patching
International Conference on Learning Representations (ICLR), 2023
Aleksandar Makelov
Georg Lange
Neel Nanda
141
29
0
28 Nov 2023
Causal Interpretation of Self-Attention in Pre-Trained Transformers
Neural Information Processing Systems (NeurIPS), 2023
R. Y. Rohekar
Yaniv Gurwicz
Shami Nisimov
MILM
113
27
0
31 Oct 2023
Circuit Component Reuse Across Tasks in Transformer Language Models
International Conference on Learning Representations (ICLR), 2023
Jack Merullo
Carsten Eickhoff
Ellie Pavlick
184
85
0
12 Oct 2023
Interpreting CLIP's Image Representation via Text-Based Decomposition
International Conference on Learning Representations (ICLR), 2023
Yossi Gandelsman
Alexei A. Efros
Jacob Steinhardt
VLM
235
139
0
09 Oct 2023
LLM Lies: Hallucinations are not Bugs, but Features as Adversarial Examples
Jia-Yu Yao
Hai-Jian Ke
Zhen-Hui Liu
Munan Ning
Li Yuan
HILM
LRM
AAML
186
245
0
02 Oct 2023
Analyzing and Mitigating Object Hallucination in Large Vision-Language Models
International Conference on Learning Representations (ICLR), 2023
Yiyang Zhou
Chenhang Cui
Jaehong Yoon
Linjun Zhang
Zhun Deng
Chelsea Finn
Mohit Bansal
Huaxiu Yao
MLLM
216
237
0
01 Oct 2023
Towards Best Practices of Activation Patching in Language Models: Metrics and Methods
International Conference on Learning Representations (ICLR), 2023
Fred Zhang
Neel Nanda
LLMSV
308
149
0
27 Sep 2023
Gender bias and stereotypes in Large Language Models
International Conference on Climate Informatics (ICCI), 2023
Hadas Kotek
Rikker Dockum
David Q. Sun
175
316
0
28 Aug 2023
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Tom Lieberum
Matthew Rahtz
János Kramár
Neel Nanda
G. Irving
Rohin Shah
Vladimir Mikulik
201
130
0
18 Jul 2023
A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Alessandro Stolfo
Yonatan Belinkov
Mrinmaya Sachan
MILM
KELM
LRM
169
61
0
24 May 2023
Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Lean Wang
Lei Li
Damai Dai
Deli Chen
Hao Zhou
Fandong Meng
Jie Zhou
Xu Sun
227
232
0
23 May 2023
Evaluating Object Hallucination in Large Vision-Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yifan Li
Yifan Du
Kun Zhou
Jinpeng Wang
Wayne Xin Zhao
Ji-Rong Wen
MLLM
LRM
493
1,092
0
17 May 2023
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Wes Gurnee
Neel Nanda
Matthew Pauly
Katherine Harvey
Dmitrii Troitskii
Dimitris Bertsimas
MILM
412
261
0
02 May 2023
How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model
Neural Information Processing Systems (NeurIPS), 2023
Michael Hanna
Ollie Liu
Alexandre Variengien
LRM
574
165
0
30 Apr 2023
Towards Automated Circuit Discovery for Mechanistic Interpretability
Neural Information Processing Systems (NeurIPS), 2023
Arthur Conmy
Augustine N. Mavor-Parker
Aengus Lynch
Stefan Heimersheim
Adrià Garriga-Alonso
238
398
0
28 Apr 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Mor Geva
Jasmijn Bastings
Katja Filippova
Amir Globerson
KELM
420
390
0
28 Apr 2023
Visual Instruction Tuning
Neural Information Processing Systems (NeurIPS), 2023
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDa
VLM
MLLM
680
6,397
0
17 Apr 2023
Can ChatGPT Replace Traditional KBQA Models? An In-depth Analysis of the Question Answering Performance of the GPT LLM Family
International Workshop on the Semantic Web (SW), 2023
Yiming Tan
Dehai Min
Y. Li
Wenbo Li
Nan Hu
Yongrui Chen
Guilin Qi
AI4MH
ELM
189
121
0
14 Mar 2023
Larger language models do in-context learning differently
Jerry W. Wei
Jason W. Wei
Yi Tay
Dustin Tran
Albert Webson
...
Xinyun Chen
Hanxiao Liu
Da Huang
Denny Zhou
Tengyu Ma
ReLM
LRM
178
413
0
07 Mar 2023
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
2.0K
16,002
0
27 Feb 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
International Conference on Learning Representations (ICLR), 2022
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
456
705
0
01 Nov 2022
Analyzing Transformers in Embedding Space
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Guy Dar
Mor Geva
Ankit Gupta
Jonathan Berant
213
110
0
06 Sep 2022
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Mor Geva
Avi Caciularu
Ke Wang
Yoav Goldberg
KELM
338
437
0
28 Mar 2022
Training language models to follow instructions with human feedback
Neural Information Processing Systems (NeurIPS), 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
1.6K
15,777
0
04 Mar 2022
Locating and Editing Factual Associations in GPT
Neural Information Processing Systems (NeurIPS), 2022
Kevin Meng
David Bau
A. Andonian
Yonatan Belinkov
KELM
582
1,736
0
10 Feb 2022
Neuron-level Interpretation of Deep NLP Models: A Survey
Transactions of the Association for Computational Linguistics (TACL), 2021
Hassan Sajjad
Nadir Durrani
Fahim Dalvi
MILM
AI4CE
185
93
0
30 Aug 2021
Knowledge Neurons in Pretrained Transformers
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Damai Dai
Li Dong
Y. Hao
Zhifang Sui
Baobao Chang
Furu Wei
KELM
MU
329
539
0
18 Apr 2021
Learning Transferable Visual Models From Natural Language Supervision
International Conference on Machine Learning (ICML), 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
1.7K
36,378
0
26 Feb 2021
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
Findings (Findings), 2020
Samuel Gehman
Suchin Gururangan
Maarten Sap
Yejin Choi
Noah A. Smith
463
1,376
0
24 Sep 2020
Language Models are Few-Shot Learners
Neural Information Processing Systems (NeurIPS), 2020
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
1.6K
48,031
0
28 May 2020
Microsoft COCO: Common Objects in Context
European Conference on Computer Vision (ECCV), 2014
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
1.4K
46,630
0
01 May 2014
1