Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2408.11182
Cited By
v1
v2 (latest)
Hide Your Malicious Goal Into Benign Narratives: Jailbreak Large Language Models through Carrier Articles
20 August 2024
Zhilong Wang
Haizhou Wang
Nanqing Luo
Lan Zhang
Xiaoyan Sun
Yebo Cao
Peng Liu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Hide Your Malicious Goal Into Benign Narratives: Jailbreak Large Language Models through Carrier Articles"
1 / 1 papers shown
Title
Layer-Level Self-Exposure and Patch: Affirmative Token Mitigation for Jailbreak Attack Defense
North American Chapter of the Association for Computational Linguistics (NAACL), 2025
Yang Ouyang
Hengrui Gu
Shuhang Lin
Qingfeng Lan
Jie Peng
B. Kailkhura
Tianlong Chen
Kaixiong Zhou
Kaixiong Zhou
AAML
267
7
0
05 Jan 2025
1