JailbreakLens: Interpreting Jailbreak Mechanism in the Lens of Representation and Circuit

17 November 2024

Papers citing "JailbreakLens: Interpreting Jailbreak Mechanism in the Lens of Representation and Circuit"

3 / 3 papers shown

Title
Making Every Step Effective: Jailbreaking Large Vision-Language Models Through Hierarchical KV Equalization Shuyang Hao Yiwei Wang Bryan Hooi J. Liu Muhao Chen Zi Huang Yujun Cai AAML VLM 56 0 0 14 Mar 2025
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models Thomas Winninger Boussad Addad Katarzyna Kapusta AAML 59 0 0 08 Mar 2025
Representation Engineering for Large-Language Models: Survey and Research Challenges Lukasz Bartoszcze Sarthak Munshi Bryan Sukidi Jennifer Yen Zejia Yang David Williams-King Linh Le Kosi Asuzu Carsten Maple 95 0 0 24 Feb 2025