Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.22586
Cited By
Precise In-Parameter Concept Erasure in Large Language Models
28 May 2025
Yoav Gur-Arieh
Clara Suslik
Yihuai Hong
Fazl Barez
Mor Geva
KELM
MU
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Precise In-Parameter Concept Erasure in Large Language Models"
10 / 10 papers shown
Title
SAEs
Can
\textit{Can}
Can
Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs
Aashiq Muhamed
Jacopo Bonato
Mona Diab
Virginia Smith
MU
147
6
0
11 Apr 2025
The Knowledge Microscope: Features as Better Analytical Lenses than Neurons
Yuheng Chen
Pengfei Cao
Kang Liu
Jun Zhao
85
2
0
18 Feb 2025
Open Problems in Machine Unlearning for AI Safety
Fazl Barez
Tingchen Fu
Ameya Prabhu
Stephen Casper
Amartya Sanyal
...
David M. Krueger
Sören Mindermann
José Hernandez-Orallo
Mor Geva
Y. Gal
MU
102
24
0
10 Jan 2025
Do Unlearning Methods Remove Information from Language Model Weights?
Aghyad Deeb
Fabien Roger
AAML
MU
113
29
0
11 Oct 2024
Erasing Conceptual Knowledge from Language Models
Rohit Gandikota
Sheridan Feucht
Samuel Marks
David Bau
KELM
ELM
MU
129
11
0
03 Oct 2024
Position: LLM Unlearning Benchmarks are Weak Measures of Progress
Pratiksha Thaker
Shengyuan Hu
Neil Kale
Yash Maurya
Zhiwei Steven Wu
Virginia Smith
MU
132
25
0
03 Oct 2024
An Adversarial Perspective on Machine Unlearning for AI Safety
Jakub Łucki
Boyi Wei
Yangsibo Huang
Peter Henderson
F. Tramèr
Javier Rando
MU
AAML
206
53
0
26 Sep 2024
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Samuel Marks
Can Rager
Eric J. Michaud
Yonatan Belinkov
David Bau
Aaron Mueller
175
159
0
28 Mar 2024
Threats, Attacks, and Defenses in Machine Unlearning: A Survey
Ziyao Liu
Huanyi Ye
Chen Chen
Yongsen Zheng
K. Lam
AAML
MU
129
32
0
20 Mar 2024
LEACE: Perfect linear concept erasure in closed form
Nora Belrose
David Schneider-Joseph
Shauli Ravfogel
Ryan Cotterell
Edward Raff
Stella Biderman
KELM
MU
182
120
0
06 Jun 2023
1