Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.08827
Cited By
Do Unlearning Methods Remove Information from Language Model Weights?
11 October 2024
Aghyad Deeb
Fabien Roger
AAML
MU
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Do Unlearning Methods Remove Information from Language Model Weights?"
8 / 8 papers shown
Title
SAEs
Can
\textit{Can}
Can
Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs
Aashiq Muhamed
Jacopo Bonato
Mona Diab
Virginia Smith
MU
37
0
0
11 Apr 2025
Not All Data Are Unlearned Equally
Aravind Krishnan
Siva Reddy
Marius Mosbach
MU
39
0
0
07 Apr 2025
Exact Unlearning of Finetuning Data via Model Merging at Scale
Kevin Kuo
Amrith Rajagopal Setlur
Kartik Srinivas
Aditi Raghunathan
Virginia Smith
MoMe
CLL
MU
40
0
0
06 Apr 2025
A General Framework to Enhance Fine-tuning-based LLM Unlearning
J. Ren
Zhenwei Dai
X. Tang
Hui Liu
Jingying Zeng
...
R. Goutam
Suhang Wang
Yue Xing
Qi He
Hui Liu
MU
91
1
0
25 Feb 2025
Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities
Zora Che
Stephen Casper
Robert Kirk
Anirudh Satheesh
Stewart Slocum
...
Zikui Cai
Bilal Chughtai
Y. Gal
Furong Huang
Dylan Hadfield-Menell
MU
AAML
ELM
66
2
0
03 Feb 2025
Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization
Phillip Guo
Aaquib Syed
Abhay Sheshadri
Aidan Ewart
Gintare Karolina Dziugaite
KELM
MU
19
5
0
16 Oct 2024
An Adversarial Perspective on Machine Unlearning for AI Safety
Jakub Łucki
Boyi Wei
Yangsibo Huang
Peter Henderson
F. Tramèr
Javier Rando
MU
AAML
54
31
0
26 Sep 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
49
18
0
02 Jul 2024
1