Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2501.02629
Cited By
Layer-Level Self-Exposure and Patch: Affirmative Token Mitigation for Jailbreak Attack Defense
5 January 2025
Yang Ouyang
Hengrui Gu
Shuhang Lin
Wenyue Hua
Jie Peng
B. Kailkhura
Tianlong Chen
Kaixiong Zhou
Kaixiong Zhou
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Layer-Level Self-Exposure and Patch: Affirmative Token Mitigation for Jailbreak Attack Defense"
1 / 1 papers shown
Title
SafeInt: Shielding Large Language Models from Jailbreak Attacks via Safety-Aware Representation Intervention
Jiaqi Wu
Chen Chen
Chunyan Hou
Xiaojie Yuan
AAML
49
0
0
24 Feb 2025
1