ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.11114
  4. Cited By

JailbreakLens: Interpreting Jailbreak Mechanism in the Lens of Representation and Circuit

17 November 2024
Zeqing He
Zhibo Wang
Zhixuan Chu
Huiyu Xu
Rui Zheng
Kui Ren
Chun Chen
ArXivPDFHTML

Papers citing "JailbreakLens: Interpreting Jailbreak Mechanism in the Lens of Representation and Circuit"

3 / 3 papers shown
Title
Making Every Step Effective: Jailbreaking Large Vision-Language Models Through Hierarchical KV Equalization
Making Every Step Effective: Jailbreaking Large Vision-Language Models Through Hierarchical KV Equalization
Shuyang Hao
Yiwei Wang
Bryan Hooi
J. Liu
Muhao Chen
Zi Huang
Yujun Cai
AAML
VLM
56
0
0
14 Mar 2025
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Thomas Winninger
Boussad Addad
Katarzyna Kapusta
AAML
59
0
0
08 Mar 2025
Representation Engineering for Large-Language Models: Survey and Research Challenges
Representation Engineering for Large-Language Models: Survey and Research Challenges
Lukasz Bartoszcze
Sarthak Munshi
Bryan Sukidi
Jennifer Yen
Zejia Yang
David Williams-King
Linh Le
Kosi Asuzu
Carsten Maple
95
0
0
24 Feb 2025
1