Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.10058
Cited By
Towards Safer Large Language Models through Machine Unlearning
15 February 2024
Zheyuan Liu
Guangyao Dou
Zhaoxuan Tan
Yijun Tian
Meng-Long Jiang
KELM
MU
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Towards Safer Large Language Models through Machine Unlearning"
20 / 20 papers shown
Title
Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions
Yiming Du
Wenyu Huang
Danna Zheng
Zhaowei Wang
Sébastien Montella
Mirella Lapata
Kam-Fai Wong
Jeff Z. Pan
KELM
MU
71
1
0
01 May 2025
Adaptive Helpfulness-Harmlessness Alignment with Preference Vectors
Ren-Wei Liang
Chin-Ting Hsu
Chan-Hung Yu
Saransh Agrawal
Shih-Cheng Huang
Shang-Tse Chen
Kuan-Hao Huang
Shao-Hua Sun
76
0
0
27 Apr 2025
ForgetMe: Evaluating Selective Forgetting in Generative Models
Zhenyu Yu
Mohd Yamani Inda Idris
Pei Wang
DiffM
MU
32
0
0
17 Apr 2025
ZJUKLAB at SemEval-2025 Task 4: Unlearning via Model Merging
Haoming Xu
Shuxun Wang
Yanqiu Zhao
Yi Zhong
Ziyan Jiang
Ningyuan Zhao
Shumin Deng
H. Chen
N. Zhang
MoMe
MU
67
0
0
27 Mar 2025
UIPE: Enhancing LLM Unlearning by Removing Knowledge Related to Forgetting Targets
Wenyu Wang
M. Zhang
Xiaotian Ye
Z. Z. Ren
Z. Chen
Pengjie Ren
MU
KELM
84
0
0
06 Mar 2025
ReLearn: Unlearning via Learning for Large Language Models
Haoming Xu
Ningyuan Zhao
Liming Yang
Sendong Zhao
Shumin Deng
Mengru Wang
Bryan Hooi
Nay Oo
H. Chen
N. Zhang
KELM
CLL
MU
65
0
0
16 Feb 2025
Unified Parameter-Efficient Unlearning for LLMs
Chenlu Ding
Jiancan Wu
Yancheng Yuan
Jinda Lu
Kai Zhang
Alex Su
Xiang Wang
Xiangnan He
MU
KELM
95
6
0
30 Nov 2024
WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models
Jinghan Jia
Jiancheng Liu
Yihua Zhang
Parikshit Ram
Nathalie Baracaldo
Sijia Liu
MU
35
2
0
23 Oct 2024
An Adversarial Perspective on Machine Unlearning for AI Safety
Jakub Łucki
Boyi Wei
Yangsibo Huang
Peter Henderson
F. Tramèr
Javier Rando
MU
AAML
66
31
0
26 Sep 2024
Alternate Preference Optimization for Unlearning Factual Knowledge in Large Language Models
Anmol Mekala
Vineeth Dorna
Shreya Dubey
Abhishek Lalwani
David Koleczek
Mukund Rungta
Sadid Hasan
Elita Lobo
KELM
MU
31
1
0
20 Sep 2024
Recent Advances in Attack and Defense Approaches of Large Language Models
Jing Cui
Yishi Xu
Zhewei Huang
Shuchang Zhou
Jianbin Jiao
Junge Zhang
PILM
AAML
47
1
0
05 Sep 2024
Composable Interventions for Language Models
Arinbjorn Kolbeinsson
Kyle O'Brien
Tianjin Huang
Shanghua Gao
Shiwei Liu
...
Anurag J. Vaidya
Faisal Mahmood
Marinka Zitnik
Tianlong Chen
Thomas Hartvigsen
KELM
MU
75
5
0
09 Jul 2024
Split, Unlearn, Merge: Leveraging Data Attributes for More Effective Unlearning in LLMs
S. Kadhe
Farhan Ahmed
Dennis Wei
Nathalie Baracaldo
Inkit Padhi
MoMe
MU
21
5
0
17 Jun 2024
Large Scale Knowledge Washing
Yu-Xiang Wang
Ruihan Wu
Zexue He
X. Chen
Julian McAuley
MU
KELM
46
4
0
26 May 2024
SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning
Jinghan Jia
Yihua Zhang
Yimeng Zhang
Jiancheng Liu
Bharat Runwal
James Diffenderfer
B. Kailkhura
Sijia Liu
MU
24
31
0
28 Apr 2024
Democratizing Large Language Models via Personalized Parameter-Efficient Fine-tuning
Zhaoxuan Tan
Qingkai Zeng
Yijun Tian
Zheyuan Liu
Bing Yin
Meng-Long Jiang
104
16
0
06 Feb 2024
Who's Harry Potter? Approximate Unlearning in LLMs
Ronen Eldan
M. Russinovich
MU
MoMe
101
171
0
03 Oct 2023
Git Re-Basin: Merging Models modulo Permutation Symmetries
Samuel K. Ainsworth
J. Hayase
S. Srinivasa
MoMe
239
313
0
11 Sep 2022
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
291
2,712
0
24 May 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
1