Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.19278
Cited By
Applying sparse autoencoders to unlearn knowledge in language models
25 October 2024
Eoin Farrell
Yeu-Tong Lau
Arthur Conmy
MU
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Applying sparse autoencoders to unlearn knowledge in language models"
2 / 2 papers shown
Title
SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders
Bartosz Cywiñski
Kamil Deja
DiffM
61
6
0
29 Jan 2025
Tracking the Feature Dynamics in LLM Training: A Mechanistic Study
Yang Xu
Y. Wang
Hao Wang
95
1
0
23 Dec 2024
1