ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.05880
  4. Cited By
Eraser: Jailbreaking Defense in Large Language Models via Unlearning
  Harmful Knowledge

Eraser: Jailbreaking Defense in Large Language Models via Unlearning Harmful Knowledge

8 April 2024
Weikai Lu
Ziqian Zeng
Jianwei Wang
Zhengdong Lu
Zelin Chen
Huiping Zhuang
Cen Chen
    MU
    AAML
    KELM
ArXivPDFHTML

Papers citing "Eraser: Jailbreaking Defense in Large Language Models via Unlearning Harmful Knowledge"

9 / 9 papers shown
Title
ReLearn: Unlearning via Learning for Large Language Models
ReLearn: Unlearning via Learning for Large Language Models
Haoming Xu
Ningyuan Zhao
Liming Yang
Sendong Zhao
Shumin Deng
Mengru Wang
Bryan Hooi
Nay Oo
H. Chen
N. Zhang
KELM
CLL
MU
85
0
0
16 Feb 2025
Unified Parameter-Efficient Unlearning for LLMs
Chenlu Ding
Jiancan Wu
Yancheng Yuan
Jinda Lu
Kai Zhang
Alex Su
Xiang Wang
Xiangnan He
MU
KELM
100
6
0
30 Nov 2024
Dissecting Fine-Tuning Unlearning in Large Language Models
Dissecting Fine-Tuning Unlearning in Large Language Models
Yihuai Hong
Yuelin Zou
Lijie Hu
Ziqian Zeng
Di Wang
Haiqin Yang
AAML
MU
37
2
0
09 Oct 2024
Recent Advances in Attack and Defense Approaches of Large Language
  Models
Recent Advances in Attack and Defense Approaches of Large Language Models
Jing Cui
Yishi Xu
Zhewei Huang
Shuchang Zhou
Jianbin Jiao
Junge Zhang
PILM
AAML
52
1
0
05 Sep 2024
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
Xunguang Wang
Daoyuan Wu
Zhenlan Ji
Zongjie Li
Pingchuan Ma
Shuai Wang
Yingjiu Li
Yang Liu
Ning Liu
Juergen Rahmel
AAML
71
8
0
08 Jun 2024
Towards Urban General Intelligence: A Review and Outlook of Urban Foundation Models
Towards Urban General Intelligence: A Review and Outlook of Urban Foundation Models
Weijiao Zhang
Jindong Han
Zhao Xu
Hang Ni
Hao Liu
Hui Xiong
Hui Xiong
AI4CE
77
15
0
30 Jan 2024
Who's Harry Potter? Approximate Unlearning in LLMs
Who's Harry Potter? Approximate Unlearning in LLMs
Ronen Eldan
M. Russinovich
MU
MoMe
101
172
0
03 Oct 2023
Knowledge Unlearning for Mitigating Privacy Risks in Language Models
Knowledge Unlearning for Mitigating Privacy Risks in Language Models
Joel Jang
Dongkeun Yoon
Sohee Yang
Sungmin Cha
Moontae Lee
Lajanugen Logeswaran
Minjoon Seo
KELM
PILM
MU
145
189
0
04 Oct 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,881
0
04 Mar 2022
1