v1v2 (latest)

Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering

17 November 2024

Papers citing "Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering"

46 / 46 papers shown

Title
Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs Minji Kim Taekyung Kim Bohyung Han 24 0 0 15 Oct 2025
Interpret, prune and distill Donut : towards lightweight VLMs for VQA on document Adnan Ben Mansour Ayoub Karine D. Naccache 16 0 0 30 Sep 2025
REMA: A Unified Reasoning Manifold Framework for Interpreting Large Language Model Bo Li Guanzhi Deng Ronghao Chen Junrong Yue Shuo Zhang Qinghua Zhao Linqi Song Lijie Wen LRM 44 0 0 26 Sep 2025
Same Task, Different Circuits: Disentangling Modality-Specific Mechanisms in VLMs Yaniv Nikankin Dana Arad Yossi Gandelsman Yonatan Belinkov 168 3 0 10 Jun 2025
Interpreting Arithmetic Mechanism in Large Language Models through Comparative Neuron AnalysisConference on Empirical Methods in Natural Language Processing (EMNLP), 2024 Zeping Yu Sophia Ananiadou LRM MILM 147 18 0 21 Sep 2024
Understanding Information Storage and Transfer in Multi-modal Large Language ModelsNeural Information Processing Systems (NeurIPS), 2024 Samyadeep Basu Martin Grayson C. Morrison Besmira Nushi Soheil Feizi Daniela Massiceti 165 28 0 06 Jun 2024
Hallucination of Multimodal Large Language Models: A Survey Zechen Bai Pichao Wang Tianjun Xiao Tong He Zongbo Han Zheng Zhang Mike Zheng Shou VLM LRM 380 282 0 29 Apr 2024
Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking Nikhil Prakash Tamar Rott Shaham Tal Haklay Yonatan Belinkov David Bau 158 89 0 22 Feb 2024
Visual Hallucinations of Multi-modal Large Language Models Wen Huang Hongbin Liu Minxin Guo Neil Zhenqiang Gong MLLM VLM 152 50 0 22 Feb 2024
How do Large Language Models Learn In-Context? Query and Key Matrices of In-Context Heads are Two Towers for Metric LearningConference on Empirical Methods in Natural Language Processing (EMNLP), 2024 Zeping Yu Sophia Ananiadou 149 15 0 05 Feb 2024
A Survey on Hallucination in Large Vision-Language Models Hanchao Liu Wenyuan Xue Yifei Chen Dapeng Chen Xiutian Zhao Ke Wang Liping Hou Rong-Zhi Li Wei Peng LRM MLLM 153 208 0 01 Feb 2024
MM-LLMs: Recent Advances in MultiModal Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 Duzhen Zhang Yahan Yu Jiahua Dong Chenxing Li Dan Su Chenhui Chu Dong Yu OffRL LRM 263 285 0 24 Jan 2024
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMsComputer Vision and Pattern Recognition (CVPR), 2024 Shengbang Tong Zhuang Liu Yuexiang Zhai Yi-An Ma Yann LeCun Saining Xie VLM MLLM 300 485 0 11 Jan 2024
Neuron-Level Knowledge Attribution in Large Language Models Zeping Yu Sophia Ananiadou FAtt KELM 155 24 0 19 Dec 2023
Successor Heads: Recurring, Interpretable Attention Heads In The WildInternational Conference on Learning Representations (ICLR), 2023 Rhys Gould Euan Ong George Ogden Arthur Conmy LRM 118 60 0 14 Dec 2023
Is This the Subspace You Are Looking for? An Interpretability Illusion for Subspace Activation PatchingInternational Conference on Learning Representations (ICLR), 2023 Aleksandar Makelov Georg Lange Neel Nanda 141 29 0 28 Nov 2023
Causal Interpretation of Self-Attention in Pre-Trained TransformersNeural Information Processing Systems (NeurIPS), 2023 R. Y. Rohekar Yaniv Gurwicz Shami Nisimov MILM 113 27 0 31 Oct 2023
Circuit Component Reuse Across Tasks in Transformer Language ModelsInternational Conference on Learning Representations (ICLR), 2023 Jack Merullo Carsten Eickhoff Ellie Pavlick 184 85 0 12 Oct 2023
Interpreting CLIP's Image Representation via Text-Based DecompositionInternational Conference on Learning Representations (ICLR), 2023 Yossi Gandelsman Alexei A. Efros Jacob Steinhardt VLM 235 139 0 09 Oct 2023
LLM Lies: Hallucinations are not Bugs, but Features as Adversarial Examples Jia-Yu Yao Hai-Jian Ke Zhen-Hui Liu Munan Ning Li Yuan HILM LRM AAML 186 245 0 02 Oct 2023
Analyzing and Mitigating Object Hallucination in Large Vision-Language ModelsInternational Conference on Learning Representations (ICLR), 2023 Yiyang Zhou Chenhang Cui Jaehong Yoon Linjun Zhang Zhun Deng Chelsea Finn Mohit Bansal Huaxiu Yao MLLM 216 237 0 01 Oct 2023
Towards Best Practices of Activation Patching in Language Models: Metrics and MethodsInternational Conference on Learning Representations (ICLR), 2023 Fred Zhang Neel Nanda LLMSV 308 149 0 27 Sep 2023
Gender bias and stereotypes in Large Language ModelsInternational Conference on Climate Informatics (ICCI), 2023 Hadas Kotek Rikker Dockum David Q. Sun 175 316 0 28 Aug 2023
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla Tom Lieberum Matthew Rahtz János Kramár Neel Nanda G. Irving Rohin Shah Vladimir Mikulik 201 130 0 18 Jul 2023
A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation AnalysisConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 Alessandro Stolfo Yonatan Belinkov Mrinmaya Sachan MILM KELM LRM 169 61 0 24 May 2023
Label Words are Anchors: An Information Flow Perspective for Understanding In-Context LearningConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 Lean Wang Lei Li Damai Dai Deli Chen Hao Zhou Fandong Meng Jie Zhou Xu Sun 227 232 0 23 May 2023
Evaluating Object Hallucination in Large Vision-Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 Yifan Li Yifan Du Kun Zhou Jinpeng Wang Wayne Xin Zhao Ji-Rong Wen MLLM LRM 493 1,092 0 17 May 2023
Finding Neurons in a Haystack: Case Studies with Sparse Probing Wes Gurnee Neel Nanda Matthew Pauly Katherine Harvey Dmitrii Troitskii Dimitris Bertsimas MILM 412 261 0 02 May 2023
How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language modelNeural Information Processing Systems (NeurIPS), 2023 Michael Hanna Ollie Liu Alexandre Variengien LRM 574 165 0 30 Apr 2023
Towards Automated Circuit Discovery for Mechanistic InterpretabilityNeural Information Processing Systems (NeurIPS), 2023 Arthur Conmy Augustine N. Mavor-Parker Aengus Lynch Stefan Heimersheim Adrià Garriga-Alonso 238 398 0 28 Apr 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 Mor Geva Jasmijn Bastings Katja Filippova Amir Globerson KELM 420 390 0 28 Apr 2023
Visual Instruction TuningNeural Information Processing Systems (NeurIPS), 2023 Haotian Liu Chunyuan Li Qingyang Wu Yong Jae Lee SyDa VLM MLLM 680 6,397 0 17 Apr 2023
Can ChatGPT Replace Traditional KBQA Models? An In-depth Analysis of the Question Answering Performance of the GPT LLM FamilyInternational Workshop on the Semantic Web (SW), 2023 Yiming Tan Dehai Min Y. Li Wenbo Li Nan Hu Yongrui Chen Guilin Qi AI4MH ELM 189 121 0 14 Mar 2023
Larger language models do in-context learning differently Jerry W. Wei Jason W. Wei Yi Tay Dustin Tran Albert Webson ... Xinyun Chen Hanxiao Liu Da Huang Denny Zhou Tengyu Ma ReLM LRM 178 413 0 07 Mar 2023
LLaMA: Open and Efficient Foundation Language Models Hugo Touvron Thibaut Lavril Gautier Izacard Xavier Martinet Marie-Anne Lachaux ... Faisal Azhar Aurelien Rodriguez Armand Joulin Edouard Grave Guillaume Lample ALM PILM 2.0K 16,002 0 27 Feb 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 smallInternational Conference on Learning Representations (ICLR), 2022 Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 456 705 0 01 Nov 2022
Analyzing Transformers in Embedding SpaceAnnual Meeting of the Association for Computational Linguistics (ACL), 2022 Guy Dar Mor Geva Ankit Gupta Jonathan Berant 213 110 0 06 Sep 2022
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary SpaceConference on Empirical Methods in Natural Language Processing (EMNLP), 2022 Mor Geva Avi Caciularu Ke Wang Yoav Goldberg KELM 338 437 0 28 Mar 2022
Training language models to follow instructions with human feedbackNeural Information Processing Systems (NeurIPS), 2022 Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 1.6K 15,777 0 04 Mar 2022
Locating and Editing Factual Associations in GPTNeural Information Processing Systems (NeurIPS), 2022 Kevin Meng David Bau A. Andonian Yonatan Belinkov KELM 582 1,736 0 10 Feb 2022
Neuron-level Interpretation of Deep NLP Models: A SurveyTransactions of the Association for Computational Linguistics (TACL), 2021 Hassan Sajjad Nadir Durrani Fahim Dalvi MILM AI4CE 185 93 0 30 Aug 2021
Knowledge Neurons in Pretrained TransformersAnnual Meeting of the Association for Computational Linguistics (ACL), 2021 Damai Dai Li Dong Y. Hao Zhifang Sui Baobao Chang Furu Wei KELM MU 329 539 0 18 Apr 2021
Learning Transferable Visual Models From Natural Language SupervisionInternational Conference on Machine Learning (ICML), 2021 Alec Radford Jong Wook Kim Chris Hallacy Aditya A. Ramesh Gabriel Goh ... Amanda Askell Pamela Mishkin Jack Clark Gretchen Krueger Ilya Sutskever CLIP VLM 1.7K 36,378 0 26 Feb 2021
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language ModelsFindings (Findings), 2020 Samuel Gehman Suchin Gururangan Maarten Sap Yejin Choi Noah A. Smith 463 1,376 0 24 Sep 2020
Language Models are Few-Shot LearnersNeural Information Processing Systems (NeurIPS), 2020 Tom B. Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared Kaplan ... Christopher Berner Sam McCandlish Alec Radford Ilya Sutskever Dario Amodei BDL 1.6K 48,031 0 28 May 2020
Microsoft COCO: Common Objects in ContextEuropean Conference on Computer Vision (ECCV), 2014 Nayeon Lee Michael Maire Serge J. Belongie Lubomir Bourdev Ross B. Girshick James Hays Pietro Perona Deva Ramanan C. L. Zitnick Piotr Dollár ObjD 1.4K 46,630 0 01 May 2014