Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2411.11114
Cited By
JailbreakLens: Interpreting Jailbreak Mechanism in the Lens of Representation and Circuit
17 November 2024
Zeqing He
Zhibo Wang
Zhixuan Chu
Huiyu Xu
Rui Zheng
Kui Ren
Chun Chen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"JailbreakLens: Interpreting Jailbreak Mechanism in the Lens of Representation and Circuit"
3 / 3 papers shown
Title
Making Every Step Effective: Jailbreaking Large Vision-Language Models Through Hierarchical KV Equalization
Shuyang Hao
Yiwei Wang
Bryan Hooi
J. Liu
Muhao Chen
Zi Huang
Yujun Cai
AAML
VLM
56
0
0
14 Mar 2025
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Thomas Winninger
Boussad Addad
Katarzyna Kapusta
AAML
59
0
0
08 Mar 2025
Representation Engineering for Large-Language Models: Survey and Research Challenges
Lukasz Bartoszcze
Sarthak Munshi
Bryan Sukidi
Jennifer Yen
Zejia Yang
David Williams-King
Linh Le
Kosi Asuzu
Carsten Maple
95
0
0
24 Feb 2025
1