ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.14539
40
0
v1v2 (latest)

Doppelgänger Method: Breaking Role Consistency in LLM Agent via Prompt-based Transferable Adversarial Attack

17 June 2025
Daewon Kang
YeongHwan Shin
Doyeon Kim
Kyu-Hwan Jung
Meong Hi Son
    AAMLSILM
ArXiv (abs)PDFHTML
Main:7 Pages
32 Figures
Bibliography:2 Pages
1 Tables
Appendix:28 Pages
Abstract

Since the advent of large language models, prompt engineering now enables the rapid, low-effort creation of diverse autonomous agents that are already in widespread use. Yet this convenience raises urgent concerns about the safety, robustness, and behavioral consistency of the underlying prompts, along with the pressing challenge of preventing those prompts from being exposed to user's attempts. In this paper, we propose the ''Doppelg\"anger method'' to demonstrate the risk of an agent being hijacked, thereby exposing system instructions and internal information. Next, we define the ''Prompt Alignment Collapse under Adversarial Transfer (PACAT)'' level to evaluate the vulnerability to this adversarial transfer attack. We also propose a ''Caution for Adversarial Transfer (CAT)'' prompt to counter the Doppelg\"anger method. The experimental results demonstrate that the Doppelg\"anger method can compromise the agent's consistency and expose its internal information. In contrast, CAT prompts enable effective defense against this adversarial attack.

View on arXiv
@article{kang2025_2506.14539,
  title={ Doppelganger Method: Breaking Role Consistency in LLM Agent via Prompt-based Transferable Adversarial Attack },
  author={ Daewon Kang and YeongHwan Shin and Doyeon Kim and Kyu-Hwan Jung and Meong Hi Son },
  journal={arXiv preprint arXiv:2506.14539},
  year={ 2025 }
}
Comments on this paper