Collapse of Irrelevant Representations (CIR) Ensures Robust and Non-Disruptive LLM Unlearning

15 September 2025

Filip Sondej

Yushi Yang

ArXiv (abs)PDF HTML Github (2★)

Main:9 Pages

8 Figures

Bibliography:4 Pages

2 Tables

Appendix:3 Pages

Abstract

Current unlearning techniques and safety training consistently fail to remove dangerous knowledge from language models. We analyze the root causes and propose a highly selective technique which unlearns robustly and without disrupting general performance.

View on arXiv

Comments on this paper