Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.02855
Cited By
Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks
3 July 2024
Zhexin Zhang
Junxiao Yang
Pei Ke
Shiyao Cui
Chujie Zheng
Hongning Wang
Minlie Huang
AAML
MU
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks"
8 / 8 papers shown
Title
FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning
Zhehao Zhang
Weijie Xu
Fanyou Wu
Chandan K. Reddy
24
0
0
12 May 2025
RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models
Bang An
Shiyue Zhang
Mark Dredze
54
0
0
25 Apr 2025
Teaching Large Language Models to Reason through Learning and Forgetting
Tianwei Ni
Allen Nie
Sapana Chaudhary
Yao Liu
Huzefa Rangwala
Rasool Fakoor
ReLM
CLL
LRM
74
0
0
15 Apr 2025
Adversarial Training for Multimodal Large Language Models against Jailbreak Attacks
Liming Lu
Shuchao Pang
Siyuan Liang
Haotian Zhu
Xiyu Zeng
Aishan Liu
Yunhuai Liu
Yongbin Zhou
AAML
49
1
0
05 Mar 2025
LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint
Qianli Ma
Dongrui Liu
Qian Chen
Linfeng Zhang
Jing Shao
MoMe
88
0
0
24 Feb 2025
Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation
Xinpeng Wang
Chengzhi Hu
Paul Röttger
Barbara Plank
46
6
0
04 Oct 2024
Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning
Ruiqi Zhang
Licong Lin
Yu Bai
Song Mei
MU
56
126
0
08 Apr 2024
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
110
300
0
19 Sep 2023
1