ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.03230
  4. Cited By
Defending Large Language Models Against Attacks With Residual Stream Activation Analysis

Defending Large Language Models Against Attacks With Residual Stream Activation Analysis

5 June 2024
Amelia Kawasaki
Andrew Davis
Houssam Abbas
    AAML
    KELM
ArXivPDFHTML

Papers citing "Defending Large Language Models Against Attacks With Residual Stream Activation Analysis"

2 / 2 papers shown
Title
AttentionDefense: Leveraging System Prompt Attention for Explainable Defense Against Novel Jailbreaks
AttentionDefense: Leveraging System Prompt Attention for Explainable Defense Against Novel Jailbreaks
Charlotte Siska
Anush Sankaran
AAML
43
0
0
10 Apr 2025
JailBreakV-28K: A Benchmark for Assessing the Robustness of MultiModal
  Large Language Models against Jailbreak Attacks
JailBreakV-28K: A Benchmark for Assessing the Robustness of MultiModal Large Language Models against Jailbreak Attacks
Weidi Luo
Siyuan Ma
Xiaogeng Liu
Xiaoyu Guo
Chaowei Xiao
AAML
69
69
0
03 Apr 2024
1