AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation

11 October 2024

Papers citing "AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation"

1 / 1 papers shown

Title
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models Thomas Winninger Boussad Addad Katarzyna Kapusta AAML 61 0 0 08 Mar 2025