v1v2 (latest)

Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models

Neural Information Processing Systems (NeurIPS), 2023

10 January 2023

ArXiv (abs)PDF HTML Github (61★)

Papers citing "Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models"

50 / 84 papers shown

Title
Cross-Modal Unlearning via Influential Neuron Path Editing in Multimodal Large Language Models K. Li Wenhao Li Di Wu Lei Yang Jun Bai Ju Jia Jason Xue MU KELM 234 0 0 10 Nov 2025
Understanding Robustness of Model Editing in Code LLMs: An Empirical Study Vinaik Chhetri A.B. Siddique Umar Farooq KELM 104 0 0 05 Nov 2025
Balancing Knowledge Updates: Toward Unified Modular Editing in LLMs Jiahao Liu Zijian Wang Kuo Zhao Dong Hu KELM 124 0 0 31 Oct 2025
From Memorization to Reasoning in the Spectrum of Loss Curvature Jack Merullo Srihita Vatsavaya Lucius Bushnaq Owen Lewis 170 0 0 28 Oct 2025
Edit Less, Achieve More: Dynamic Sparse Neuron Masking for Lifelong Knowledge Editing in LLMs Jinzhe Liu Junshu Sun Shufan Shen Chenxue Yang Shuhui Wang KELM CLL 301 1 0 25 Oct 2025
An Empirical Study of Sample Selection Strategies for Large Language Model Repair Xuran Li Jingyi Wang KELM 120 0 0 23 Oct 2025
Explainability of Large Language Models: Opportunities and Challenges toward Generating Trustworthy Explanations Shahin Atakishiyev H. Babiker Jiayi Dai Nawshad Farruque Teruaki Hayashi ... Md Abed Rahman Iain Smith Mi-Young Kim Osmar R. Zaïane Randy Goebel LRM 137 0 0 20 Oct 2025
Bilinear relational structure fixes reversal curse and enables consistent model editing Dong-Kyum Kim Minsung Kim Jea Kwon Nakyeong Yang Meeyoung Cha KELM 280 0 0 26 Sep 2025
Do All Autoregressive Transformers Remember Facts the Same Way? A Cross-Architecture Analysis of Recall Mechanisms Minyeong Choe Haehyun Cho Changho Seo Hyunil Kim KELM HILM 126 2 0 10 Sep 2025
Avoiding Knowledge Edit Skipping in Multi-hop Question Answering with Guided Decomposition Yi Liu Xiangrong Zhu Xiangyu Liu Wei Wei Wei Hu KELM 84 0 0 09 Sep 2025
Measuring Uncertainty in Transformer Circuits with Effective Information Consistency Anatoly A. Krasnovsky 59 0 0 08 Sep 2025
Flexible Feature Distillation for Large Language Models Khouloud Saadi Di Wang 205 0 0 14 Jul 2025
Scalpel vs. Hammer: GRPO Amplifies Existing Capabilities, SFT Replaces Them Neel Rajani Aryo Pradipta Gema Seraphina Goldfarb-Tarrant Ivan Titov 210 6 0 13 Jul 2025
Steering Information Utility in Key-Value Memory for Language Model Post-Training Chunyuan Deng Ruidi Chang Hanjie Chen LLMSV 326 0 0 07 Jul 2025
Multiple Streams of Knowledge Retrieval: Enriching and Recalling in Transformers Todd Nief David Reber Sean Richardson Ari Holtzman KELM 137 0 0 25 Jun 2025
From Concepts to Components: Concept-Agnostic Attention Module Discovery in Transformers Jingtong Su Julia Kempe Karen Ullrich 217 3 0 20 Jun 2025
Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking Wuwei Zhang Fangcong Yin Howard Yen Danqi Chen Xi Ye LRM 257 4 0 11 Jun 2025
Private Memorization Editing: Turning Memorization into a Defense to Strengthen Data Privacy in Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 Elena Sofia Ruzzetti Giancarlo A. Xompero Davide Venditti Fabio Massimo Zanzotto KELM PILM 233 1 0 09 Jun 2025
Bridging External and Parametric Knowledge: Mitigating Hallucination of LLMs with Shared-Private Semantic Synergy in Dual-Stream Knowledge Yi Sui Chaozhuo Li Chen Zhang D. Song Qiuchi Li 158 1 0 06 Jun 2025
COMPKE: Complex Question Answering under Knowledge EditingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 Keyuan Cheng Zijian Kan Zhixian He Zhuoran Zhang Muhammad Asif Ali Ke Xu Lijie Hu Di Wang KELM 264 3 0 01 Jun 2025
Drop Dropout on Single-Epoch Language Model PretrainingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 Houjun Liu John Bauer Christopher D. Manning LRM 156 0 0 30 May 2025
Paths Not Taken: Understanding and Mending the Multilingual Factual Recall Pipeline Meng Lu Ruochen Zhang Carsten Eickhoff Ellie Pavlick HILM KELM LRM 261 6 0 26 May 2025
Does Localization Inform Unlearning? A Rigorous Examination of Local Parameter Attribution for Knowledge Unlearning in Language Models Hwiyeong Lee Uiji Hwang Hyelim Lim Taeuk Kim MU 266 1 0 22 May 2025
Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-DistillationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 Stefan Vasilev Christian Herold Baohao Liao Seyyed Hadi Hashemi Shahram Khadivi Christof Monz MU 868 0 0 09 May 2025
Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation Vaidehi Patil Yi-Lin Sung Peter Hase Jie Peng Jen-tse Huang Joey Tianyi Zhou AAML MU 480 6 0 01 May 2025
SUV: Scalable Large Language Model Copyright Compliance with Regularized Selective Unlearning Tianyang Xu Xiaoze Liu Feijie Wu Xiaoqian Wang Jing Gao MU 479 5 0 29 Mar 2025
CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners Yunzhi Yao Jizhan Fang Jia-Chen Gu Ningyu Zhang Shumin Deng Ningyu Zhang Nanyun Peng KELM 326 4 0 20 Mar 2025
Implicit Reasoning in Transformers is Reasoning through ShortcutsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 Tianhe Lin Jian Xie Siyu Yuan Deqing Yang ReLM LRM 378 7 0 10 Mar 2025
SAKE: Steering Activations for Knowledge EditingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 Marco Scialanga Thibault Laugel Vincent Grari Marcin Detyniecki KELM LLMSV 282 3 0 03 Mar 2025
A Causal Lens for Evaluating Faithfulness Metrics Kerem Zaman Shashank Srivastava 361 4 0 26 Feb 2025
Do Multilingual LLMs Think In English? Lisa Schut Y. Gal Sebastian Farquhar 236 42 0 24 Feb 2025
Robust Concept Erasure Using Task Vectors Minh Pham Kelly O. Marshall Chinmay Hegde Niv Cohen 421 24 0 21 Feb 2025
Revealing and Mitigating Over-Attention in Knowledge EditingInternational Conference on Learning Representations (ICLR), 2025 Pinzheng Wang Zecheng Tang Keyan Zhou Junlin Li Qiaoming Zhu Hao Fei KELM 496 4 0 21 Feb 2025
MLaKE: Multilingual Knowledge Editing Benchmark for Large Language ModelsInternational Conference on Computational Linguistics (COLING), 2024 Zihao Wei Jingcheng Deng Liang Pang Hanxing Ding Huawei Shen Xueqi Cheng KELM 236 10 0 20 Feb 2025
The Knowledge Microscope: Features as Better Analytical Lenses than NeuronsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 Yuheng Chen Pengfei Cao Kang Liu Jun Zhao 246 4 0 18 Feb 2025
Elucidating Mechanisms of Demographic Bias in LLMs for Healthcare Hiba Ahsan Arnab Sen Sharma Silvio Amir David Bau Byron C. Wallace 314 2 0 18 Feb 2025
Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis Xiang Wang Yan Hu Wenyu Du Reynold Cheng Benyou Wang Difan Zou 446 7 0 17 Feb 2025
Making Sense Of Distributed Representations With Activation Spectroscopy Kyle Reing Greg Ver Steeg Aram Galstyan 194 0 0 28 Jan 2025
LLMs as Repositories of Factual Knowledge: Limitations and Solutions Seyed Mahed Mousavi Simone Alghisi Giuseppe Riccardi KELM 242 5 0 22 Jan 2025
Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic Yifei He Yuzheng Hu Yong Lin Tong Zhang Han Zhao FedML MoMe 303 30 0 08 Jan 2025
Towards Unifying Interpretability and Control: Evaluation via Intervention Usha Bhalla Suraj Srinivas Asma Ghandeharioun Himabindu Lakkaraju 343 17 0 07 Nov 2024
A Implies B: Circuit Analysis in LLMs for Propositional Logical Reasoning Guan Zhe Hong Nishanth Dikkala Enming Luo Cyrus Rashtchian Xin Wang Rina Panigrahy OffRL LRM NAI 344 0 0 06 Nov 2024
All or None: Identifiable Linear Properties of Next-token Predictors in Language ModelingInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024 Emanuele Marconato Sébastien Lachapelle Sebastian Weichwald Luigi Gresele 324 6 0 30 Oct 2024
Learning and Unlearning of Fabricated Knowledge in Language Models Chen Sun Nolan Miller A. Zhmoginov Max Vladymyrov Mark Sandler KELM MU 197 4 0 29 Oct 2024
WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language ModelsNeural Information Processing Systems (NeurIPS), 2024 Jinghan Jia Jiancheng Liu Yihua Zhang Parikshit Ram Nathalie Baracaldo Sijia Liu MU 365 15 0 23 Oct 2024
Fact Recall, Heuristics or Pure Guesswork? Precise Interpretations of Language Models for Fact CompletionAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 Denitsa Saynova Lovisa Hagström Moa Johansson Richard Johansson Marco Kuhlmann HILM 510 2 0 18 Oct 2024
MINER: Mining the Underlying Pattern of Modality-Specific Neurons in Multimodal Large Language Models Kaichen Huang Jiahao Huo Yibo Yan Kun Wang Yutao Yue Xuming Hu 200 2 0 07 Oct 2024
Position: LLM Unlearning Benchmarks are Weak Measures of Progress Pratiksha Thaker Shengyuan Hu Neil Kale Yash Maurya Zhiwei Steven Wu Virginia Smith MU 320 33 0 03 Oct 2024
AlphaEdit: Null-Space Constrained Knowledge Editing for Language ModelsInternational Conference on Learning Representations (ICLR), 2024 Cunchun Li Houcheng Jiang Kun Wang Yunshan Ma Shi Jie Xiangnan He Tat-Seng Chua Tat-seng Chua KELM 459 130 0 03 Oct 2024
Relation Also Knows: Rethinking the Recall and Editing of Factual Associations in Auto-Regressive Transformer Language ModelsAAAI Conference on Artificial Intelligence (AAAI), 2024 Xiyu Liu Zhengxiao Liu Naibin Gu Zheng Lin Wanli Ma Ji Xiang Weiping Wang KELM 291 3 0 27 Aug 2024