PICO: Secure Transformers via Robust Prompt Isolation and Cybersecurity Oversight

We propose a robust transformer architecture designed to prevent prompt injection attacks and ensure secure, reliable response generation. Our PICO (Prompt Isolation and Cybersecurity Oversight) framework structurally separates trusted system instructions from untrusted user inputs through dual channels that are processed independently and merged only by a controlled, gated fusion mechanism. In addition, we integrate a specialized Security Expert Agent within a Mixture-of-Experts (MoE) framework and incorporate a Cybersecurity Knowledge Graph (CKG) to supply domain-specific reasoning. Our training design further ensures that the system prompt branch remains immutable while the rest of the network learns to handle adversarial inputs safely. This PICO framework is presented via a general mathematical formulation, then elaborated in terms of the specifics of transformer architecture, and fleshed out via hypothetical case studies including Policy Puppetry attacks. While the most effective implementation may involve training transformers in a PICO-based way from scratch, we also present a cost-effective fine-tuning approach.
View on arXiv@article{goertzel2025_2504.21029, title={ PICO: Secure Transformers via Robust Prompt Isolation and Cybersecurity Oversight }, author={ Ben Goertzel and Paulos Yibelo }, journal={arXiv preprint arXiv:2504.21029}, year={ 2025 } }