v1v2 (latest)

Does Localization Inform Unlearning? A Rigorous Examination of Local Parameter Attribution for Knowledge Unlearning in Language Models

22 May 2025

ArXiv (abs)PDF HTML Github (4★)

Papers citing "Does Localization Inform Unlearning? A Rigorous Examination of Local Parameter Attribution for Knowledge Unlearning in Language Models"

18 / 18 papers shown

SoK: Machine Unlearning for Large Language Models

182

10 Jun 2025

Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning

338

322

08 Apr 2024

Digital Forgetting in Large Language Models: A Survey of Unlearning MethodsArtificial Intelligence Review (Artif Intell Rev), 2024

Alberto Blanco-Justicia

N. Jebreel

Benet Manzanares-Salor

336

02 Apr 2024

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

...

758

305

05 Mar 2024

Eight Methods to Evaluate Robust Unlearning in LLMs

Dylan Hadfield-Menell

ELM MU

340

116

26 Feb 2024

TOFU: A Task of Fictitious Unlearning for LLMs

J. Zico Kolter

347

318

11 Jan 2024

Do Localization Methods Actually Localize Memorized Data in LLMs? A Tale of Two BenchmarksNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023

Ting-Yun Chang

Jesse Thomason

Robin Jia

323

15 Nov 2023

Who's Harry Potter? Approximate Unlearning in LLMs

Ronen Eldan

M. Russinovich

MU MoMe

441

320

03 Oct 2023

Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction AttacksInternational Conference on Learning Representations (ICLR), 2023

302

147

29 Sep 2023

Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities

221

107

24 Aug 2023

Direct Preference Optimization: Your Language Model is Secretly a Reward ModelNeural Information Processing Systems (NeurIPS), 2023

Christopher D. Manning

Chelsea Finn

ALM

906

6,769

29 May 2023

Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023

591

163

28 Apr 2023

Dissecting Recall of Factual Associations in Auto-Regressive Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

735

420

28 Apr 2023

Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language ModelsNeural Information Processing Systems (NeurIPS), 2023

342

230

10 Jan 2023

Knowledge Unlearning for Mitigating Privacy Risks in Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

490

353

04 Oct 2022

Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary SpaceConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

629

462

28 Mar 2022

Locating and Editing Factual Associations in GPTNeural Information Processing Systems (NeurIPS), 2022

962

1,972

10 Feb 2022

Transformer Feed-Forward Layers Are Key-Value MemoriesConference on Empirical Methods in Natural Language Processing (EMNLP), 2020

622

1,159

29 Dec 2020