Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2501.04952
Cited By
Open Problems in Machine Unlearning for AI Safety
10 January 2025
Fazl Barez
Tingchen Fu
Ameya Prabhu
Stephen Casper
Amartya Sanyal
Adel Bibi
Aidan O'Gara
Robert Kirk
Ben Bucknall
Tim Fist
Luke Ong
Philip Torr
Kwok-Yan Lam
Robert F. Trager
David M. Krueger
Sören Mindermann
José Hernandez-Orallo
Mor Geva
Y. Gal
MU
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Open Problems in Machine Unlearning for AI Safety"
15 / 15 papers shown
Title
LLM Unlearning Should Be Form-Independent
Xiaotian Ye
Mengqi Zhang
Shu Wu
MU
19
0
0
09 Jun 2025
Distillation Robustifies Unlearning
Bruce W. Lee
Addie Foote
Alex Infanger
Leni Shor
Harish Kamath
Jacob Goldman-Wetzler
Bryce Woodworth
Alex Cloud
Alexander Matt Turner
MU
57
0
0
06 Jun 2025
Invariance Makes LLM Unlearning Resilient Even to Unanticipated Downstream Fine-Tuning
Changsheng Wang
Yihua Zhang
Jinghan Jia
Parikshit Ram
Dennis L. Wei
Yuguang Yao
Soumyadeep Pal
Nathalie Baracaldo
Sijia Liu
MU
67
0
0
02 Jun 2025
Existing Large Language Model Unlearning Evaluations Are Inconclusive
Zhili Feng
Yixuan Even Xu
Alexander Robey
Robert Kirk
Xander Davies
Yarin Gal
Avi Schwarzschild
J. Zico Kolter
MU
ELM
35
0
0
31 May 2025
Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement
Gabriele Sarti
Vilém Zouhar
Malvina Nissim
Arianna Bisazza
47
0
0
29 May 2025
Precise In-Parameter Concept Erasure in Large Language Models
Yoav Gur-Arieh
Clara Suslik
Yihuai Hong
Fazl Barez
Mor Geva
KELM
MU
101
0
0
28 May 2025
CTRAP: Embedding Collapse Trap to Safeguard Large Language Models from Harmful Fine-Tuning
Biao Yi
Tiansheng Huang
Baolei Zhang
Tong Li
Lihai Nie
Zheli Liu
Li Shen
MU
AAML
84
0
0
22 May 2025
On the creation of narrow AI: hierarchy and nonlocality of neural network skills
Eric J. Michaud
Asher Parker-Sartori
Max Tegmark
111
0
0
21 May 2025
Reward Inside the Model: A Lightweight Hidden-State Reward Model for LLM's Best-of-N sampling
Jizhou Guo
Zhaomin Wu
Philip S. Yu
84
0
0
18 May 2025
Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions
Yiming Du
Wenyu Huang
Danna Zheng
Zhaowei Wang
Sébastien Montella
Mirella Lapata
Kam-Fai Wong
Jeff Z. Pan
KELM
MU
235
5
0
01 May 2025
ZJUKLAB at SemEval-2025 Task 4: Unlearning via Model Merging
Haoming Xu
Shuxun Wang
Yanqiu Zhao
Yi Zhong
Ziyan Jiang
Ningyuan Zhao
Shumin Deng
Hong Chen
N. Zhang
MoMe
MU
182
0
0
27 Mar 2025
Do Sparse Autoencoders Generalize? A Case Study of Answerability
Lovis Heindrich
Philip Torr
Fazl Barez
Veronika Thost
142
2
0
27 Feb 2025
SafeEraser: Enhancing Safety in Multimodal Large Language Models through Multimodal Machine Unlearning
Junkai Chen
Zhijie Deng
Kening Zheng
Yibo Yan
Shuliang Liu
PeiJun Wu
Peijie Jiang
Qingbin Liu
Xuming Hu
MU
112
8
0
18 Feb 2025
ReLearn: Unlearning via Learning for Large Language Models
Haoming Xu
Ningyuan Zhao
Liming Yang
Sendong Zhao
Shumin Deng
Mengru Wang
Bryan Hooi
Nay Oo
Ningyu Zhang
N. Zhang
MU
KELM
CLL
548
3
0
16 Feb 2025
Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities
Zora Che
Stephen Casper
Robert Kirk
Anirudh Satheesh
Stewart Slocum
...
Zikui Cai
Bilal Chughtai
Y. Gal
Furong Huang
Dylan Hadfield-Menell
MU
AAML
ELM
181
7
0
03 Feb 2025
1