Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2301.04213
Cited By
v1
v2 (latest)
Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models
Neural Information Processing Systems (NeurIPS), 2023
10 January 2023
Peter Hase
Joey Tianyi Zhou
Been Kim
Asma Ghandeharioun
MILM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (61★)
Papers citing
"Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models"
50 / 84 papers shown
Title
Cross-Modal Unlearning via Influential Neuron Path Editing in Multimodal Large Language Models
K. Li
Wenhao Li
Di Wu
Lei Yang
Jun Bai
Ju Jia
Jason Xue
MU
KELM
234
0
0
10 Nov 2025
Understanding Robustness of Model Editing in Code LLMs: An Empirical Study
Vinaik Chhetri
A.B. Siddique
Umar Farooq
KELM
104
0
0
05 Nov 2025
Balancing Knowledge Updates: Toward Unified Modular Editing in LLMs
Jiahao Liu
Zijian Wang
Kuo Zhao
Dong Hu
KELM
124
0
0
31 Oct 2025
From Memorization to Reasoning in the Spectrum of Loss Curvature
Jack Merullo
Srihita Vatsavaya
Lucius Bushnaq
Owen Lewis
170
0
0
28 Oct 2025
Edit Less, Achieve More: Dynamic Sparse Neuron Masking for Lifelong Knowledge Editing in LLMs
Jinzhe Liu
Junshu Sun
Shufan Shen
Chenxue Yang
Shuhui Wang
KELM
CLL
301
1
0
25 Oct 2025
An Empirical Study of Sample Selection Strategies for Large Language Model Repair
Xuran Li
Jingyi Wang
KELM
120
0
0
23 Oct 2025
Explainability of Large Language Models: Opportunities and Challenges toward Generating Trustworthy Explanations
Shahin Atakishiyev
H. Babiker
Jiayi Dai
Nawshad Farruque
Teruaki Hayashi
...
Md Abed Rahman
Iain Smith
Mi-Young Kim
Osmar R. Zaïane
Randy Goebel
LRM
137
0
0
20 Oct 2025
Bilinear relational structure fixes reversal curse and enables consistent model editing
Dong-Kyum Kim
Minsung Kim
Jea Kwon
Nakyeong Yang
Meeyoung Cha
KELM
280
0
0
26 Sep 2025
Do All Autoregressive Transformers Remember Facts the Same Way? A Cross-Architecture Analysis of Recall Mechanisms
Minyeong Choe
Haehyun Cho
Changho Seo
Hyunil Kim
KELM
HILM
126
2
0
10 Sep 2025
Avoiding Knowledge Edit Skipping in Multi-hop Question Answering with Guided Decomposition
Yi Liu
Xiangrong Zhu
Xiangyu Liu
Wei Wei
Wei Hu
KELM
84
0
0
09 Sep 2025
Measuring Uncertainty in Transformer Circuits with Effective Information Consistency
Anatoly A. Krasnovsky
59
0
0
08 Sep 2025
Flexible Feature Distillation for Large Language Models
Khouloud Saadi
Di Wang
205
0
0
14 Jul 2025
Scalpel vs. Hammer: GRPO Amplifies Existing Capabilities, SFT Replaces Them
Neel Rajani
Aryo Pradipta Gema
Seraphina Goldfarb-Tarrant
Ivan Titov
210
6
0
13 Jul 2025
Steering Information Utility in Key-Value Memory for Language Model Post-Training
Chunyuan Deng
Ruidi Chang
Hanjie Chen
LLMSV
326
0
0
07 Jul 2025
Multiple Streams of Knowledge Retrieval: Enriching and Recalling in Transformers
Todd Nief
David Reber
Sean Richardson
Ari Holtzman
KELM
137
0
0
25 Jun 2025
From Concepts to Components: Concept-Agnostic Attention Module Discovery in Transformers
Jingtong Su
Julia Kempe
Karen Ullrich
217
3
0
20 Jun 2025
Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking
Wuwei Zhang
Fangcong Yin
Howard Yen
Danqi Chen
Xi Ye
LRM
257
4
0
11 Jun 2025
Private Memorization Editing: Turning Memorization into a Defense to Strengthen Data Privacy in Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Elena Sofia Ruzzetti
Giancarlo A. Xompero
Davide Venditti
Fabio Massimo Zanzotto
KELM
PILM
233
1
0
09 Jun 2025
Bridging External and Parametric Knowledge: Mitigating Hallucination of LLMs with Shared-Private Semantic Synergy in Dual-Stream Knowledge
Yi Sui
Chaozhuo Li
Chen Zhang
D. Song
Qiuchi Li
158
1
0
06 Jun 2025
COMPKE: Complex Question Answering under Knowledge Editing
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Keyuan Cheng
Zijian Kan
Zhixian He
Zhuoran Zhang
Muhammad Asif Ali
Ke Xu
Lijie Hu
Di Wang
KELM
264
3
0
01 Jun 2025
Drop Dropout on Single-Epoch Language Model Pretraining
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Houjun Liu
John Bauer
Christopher D. Manning
LRM
156
0
0
30 May 2025
Paths Not Taken: Understanding and Mending the Multilingual Factual Recall Pipeline
Meng Lu
Ruochen Zhang
Carsten Eickhoff
Ellie Pavlick
HILM
KELM
LRM
261
6
0
26 May 2025
Does Localization Inform Unlearning? A Rigorous Examination of Local Parameter Attribution for Knowledge Unlearning in Language Models
Hwiyeong Lee
Uiji Hwang
Hyelim Lim
Taeuk Kim
MU
266
1
0
22 May 2025
Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Stefan Vasilev
Christian Herold
Baohao Liao
Seyyed Hadi Hashemi
Shahram Khadivi
Christof Monz
MU
868
0
0
09 May 2025
Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation
Vaidehi Patil
Yi-Lin Sung
Peter Hase
Jie Peng
Jen-tse Huang
Joey Tianyi Zhou
AAML
MU
480
6
0
01 May 2025
SUV: Scalable Large Language Model Copyright Compliance with Regularized Selective Unlearning
Tianyang Xu
Xiaoze Liu
Feijie Wu
Xiaoqian Wang
Jing Gao
MU
479
5
0
29 Mar 2025
CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners
Yunzhi Yao
Jizhan Fang
Jia-Chen Gu
Ningyu Zhang
Shumin Deng
Ningyu Zhang
Nanyun Peng
KELM
326
4
0
20 Mar 2025
Implicit Reasoning in Transformers is Reasoning through Shortcuts
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Tianhe Lin
Jian Xie
Siyu Yuan
Deqing Yang
ReLM
LRM
378
7
0
10 Mar 2025
SAKE: Steering Activations for Knowledge Editing
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Marco Scialanga
Thibault Laugel
Vincent Grari
Marcin Detyniecki
KELM
LLMSV
282
3
0
03 Mar 2025
A Causal Lens for Evaluating Faithfulness Metrics
Kerem Zaman
Shashank Srivastava
361
4
0
26 Feb 2025
Do Multilingual LLMs Think In English?
Lisa Schut
Y. Gal
Sebastian Farquhar
236
42
0
24 Feb 2025
Robust Concept Erasure Using Task Vectors
Minh Pham
Kelly O. Marshall
Chinmay Hegde
Niv Cohen
421
24
0
21 Feb 2025
Revealing and Mitigating Over-Attention in Knowledge Editing
International Conference on Learning Representations (ICLR), 2025
Pinzheng Wang
Zecheng Tang
Keyan Zhou
Junlin Li
Qiaoming Zhu
Hao Fei
KELM
496
4
0
21 Feb 2025
MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models
International Conference on Computational Linguistics (COLING), 2024
Zihao Wei
Jingcheng Deng
Liang Pang
Hanxing Ding
Huawei Shen
Xueqi Cheng
KELM
236
10
0
20 Feb 2025
The Knowledge Microscope: Features as Better Analytical Lenses than Neurons
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Yuheng Chen
Pengfei Cao
Kang Liu
Jun Zhao
246
4
0
18 Feb 2025
Elucidating Mechanisms of Demographic Bias in LLMs for Healthcare
Hiba Ahsan
Arnab Sen Sharma
Silvio Amir
David Bau
Byron C. Wallace
314
2
0
18 Feb 2025
Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis
Xiang Wang
Yan Hu
Wenyu Du
Reynold Cheng
Benyou Wang
Difan Zou
446
7
0
17 Feb 2025
Making Sense Of Distributed Representations With Activation Spectroscopy
Kyle Reing
Greg Ver Steeg
Aram Galstyan
194
0
0
28 Jan 2025
LLMs as Repositories of Factual Knowledge: Limitations and Solutions
Seyed Mahed Mousavi
Simone Alghisi
Giuseppe Riccardi
KELM
242
5
0
22 Jan 2025
Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic
Yifei He
Yuzheng Hu
Yong Lin
Tong Zhang
Han Zhao
FedML
MoMe
303
30
0
08 Jan 2025
Towards Unifying Interpretability and Control: Evaluation via Intervention
Usha Bhalla
Suraj Srinivas
Asma Ghandeharioun
Himabindu Lakkaraju
343
17
0
07 Nov 2024
A Implies B: Circuit Analysis in LLMs for Propositional Logical Reasoning
Guan Zhe Hong
Nishanth Dikkala
Enming Luo
Cyrus Rashtchian
Xin Wang
Rina Panigrahy
OffRL
LRM
NAI
344
0
0
06 Nov 2024
All or None: Identifiable Linear Properties of Next-token Predictors in Language Modeling
International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Emanuele Marconato
Sébastien Lachapelle
Sebastian Weichwald
Luigi Gresele
324
6
0
30 Oct 2024
Learning and Unlearning of Fabricated Knowledge in Language Models
Chen Sun
Nolan Miller
A. Zhmoginov
Max Vladymyrov
Mark Sandler
KELM
MU
197
4
0
29 Oct 2024
WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models
Neural Information Processing Systems (NeurIPS), 2024
Jinghan Jia
Jiancheng Liu
Yihua Zhang
Parikshit Ram
Nathalie Baracaldo
Sijia Liu
MU
365
15
0
23 Oct 2024
Fact Recall, Heuristics or Pure Guesswork? Precise Interpretations of Language Models for Fact Completion
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Denitsa Saynova
Lovisa Hagström
Moa Johansson
Richard Johansson
Marco Kuhlmann
HILM
510
2
0
18 Oct 2024
MINER: Mining the Underlying Pattern of Modality-Specific Neurons in Multimodal Large Language Models
Kaichen Huang
Jiahao Huo
Yibo Yan
Kun Wang
Yutao Yue
Xuming Hu
200
2
0
07 Oct 2024
Position: LLM Unlearning Benchmarks are Weak Measures of Progress
Pratiksha Thaker
Shengyuan Hu
Neil Kale
Yash Maurya
Zhiwei Steven Wu
Virginia Smith
MU
320
33
0
03 Oct 2024
AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models
International Conference on Learning Representations (ICLR), 2024
Cunchun Li
Houcheng Jiang
Kun Wang
Yunshan Ma
Shi Jie
Xiangnan He
Tat-Seng Chua
Tat-seng Chua
KELM
459
130
0
03 Oct 2024
Relation Also Knows: Rethinking the Recall and Editing of Factual Associations in Auto-Regressive Transformer Language Models
AAAI Conference on Artificial Intelligence (AAAI), 2024
Xiyu Liu
Zhengxiao Liu
Naibin Gu
Zheng Lin
Wanli Ma
Ji Xiang
Weiping Wang
KELM
291
3
0
27 Aug 2024
1
2
Next