ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.15334
33
0

Attention Eclipse: Manipulating Attention to Bypass LLM Safety-Alignment

24 February 2025
Pedram Zaree
Md Abdullah Al Mamun
Quazi Mishkatul Alam
Yue Dong
Ihsen Alouani
Nael B. Abu-Ghazaleh
    AAML
ArXivPDFHTML
Abstract

Recent research has shown that carefully crafted jailbreak inputs can induce large language models to produce harmful outputs, despite safety measures such as alignment. It is important to anticipate the range of potential Jailbreak attacks to guide effective defenses and accurate assessment of model safety. In this paper, we present a new approach for generating highly effective Jailbreak attacks that manipulate the attention of the model to selectively strengthen or weaken attention among different parts of the prompt. By harnessing attention loss, we develop more effective jailbreak attacks, that are also transferrable. The attacks amplify the success rate of existing Jailbreak algorithms including GCG, AutoDAN, and ReNeLLM, while lowering their generation cost (for example, the amplified GCG attack achieves 91.2% ASR, vs. 67.9% for the original attack on Llama2-7B/AdvBench, using less than a third of the generation time).

View on arXiv
@article{zaree2025_2502.15334,
  title={ Attention Eclipse: Manipulating Attention to Bypass LLM Safety-Alignment },
  author={ Pedram Zaree and Md Abdullah Al Mamun and Quazi Mishkatul Alam and Yue Dong and Ihsen Alouani and Nael Abu-Ghazaleh },
  journal={arXiv preprint arXiv:2502.15334},
  year={ 2025 }
}
Comments on this paper