ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.08827
  4. Cited By
Do Unlearning Methods Remove Information from Language Model Weights?

Do Unlearning Methods Remove Information from Language Model Weights?

11 October 2024
Aghyad Deeb
Fabien Roger
    AAML
    MU
ArXivPDFHTML

Papers citing "Do Unlearning Methods Remove Information from Language Model Weights?"

8 / 8 papers shown
Title
SAEs $\textit{Can}$ Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs
SAEs Can\textit{Can}Can Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs
Aashiq Muhamed
Jacopo Bonato
Mona Diab
Virginia Smith
MU
37
0
0
11 Apr 2025
Not All Data Are Unlearned Equally
Not All Data Are Unlearned Equally
Aravind Krishnan
Siva Reddy
Marius Mosbach
MU
39
0
0
07 Apr 2025
Exact Unlearning of Finetuning Data via Model Merging at Scale
Exact Unlearning of Finetuning Data via Model Merging at Scale
Kevin Kuo
Amrith Rajagopal Setlur
Kartik Srinivas
Aditi Raghunathan
Virginia Smith
MoMe
CLL
MU
40
0
0
06 Apr 2025
A General Framework to Enhance Fine-tuning-based LLM Unlearning
A General Framework to Enhance Fine-tuning-based LLM Unlearning
J. Ren
Zhenwei Dai
X. Tang
Hui Liu
Jingying Zeng
...
R. Goutam
Suhang Wang
Yue Xing
Qi He
Hui Liu
MU
91
1
0
25 Feb 2025
Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities
Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities
Zora Che
Stephen Casper
Robert Kirk
Anirudh Satheesh
Stewart Slocum
...
Zikui Cai
Bilal Chughtai
Y. Gal
Furong Huang
Dylan Hadfield-Menell
MU
AAML
ELM
66
2
0
03 Feb 2025
Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via
  Mechanistic Localization
Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization
Phillip Guo
Aaquib Syed
Abhay Sheshadri
Aidan Ewart
Gintare Karolina Dziugaite
KELM
MU
19
5
0
16 Oct 2024
An Adversarial Perspective on Machine Unlearning for AI Safety
An Adversarial Perspective on Machine Unlearning for AI Safety
Jakub Łucki
Boyi Wei
Yangsibo Huang
Peter Henderson
F. Tramèr
Javier Rando
MU
AAML
54
31
0
26 Sep 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
49
18
0
02 Jul 2024
1