Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.03230
Cited By
Defending Large Language Models Against Attacks With Residual Stream Activation Analysis
5 June 2024
Amelia Kawasaki
Andrew Davis
Houssam Abbas
AAML
KELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Defending Large Language Models Against Attacks With Residual Stream Activation Analysis"
2 / 2 papers shown
Title
AttentionDefense: Leveraging System Prompt Attention for Explainable Defense Against Novel Jailbreaks
Charlotte Siska
Anush Sankaran
AAML
43
0
0
10 Apr 2025
JailBreakV-28K: A Benchmark for Assessing the Robustness of MultiModal Large Language Models against Jailbreak Attacks
Weidi Luo
Siyuan Ma
Xiaogeng Liu
Xiaoyu Guo
Chaowei Xiao
AAML
69
69
0
03 Apr 2024
1