StructTransform: A Scalable Attack Surface for Safety-Aligned Large Language Models

17 February 2025

Shehel Yoosuf

ArXiv (abs)PDF HTML Github (1★)

Main:15 Pages

7 Figures

Bibliography:4 Pages

6 Tables

Appendix:1 Pages

Abstract

In this work, we present a series of structure transformation attacks on LLM alignment, where we encode natural language intent using diverse syntax spaces, ranging from simple structure formats and basic query languages (e.g. SQL) to new novel spaces and syntaxes created entirely by LLMs. Our extensive evaluation shows that our simplest attacks can achieve close to 90% success rate, even on strict LLMs (such as Claude 3.5 Sonnet) using SOTA alignment mechanisms. We improve the attack performance further by using an adaptive scheme that combines structure transformations along with existing \textit{content transformations}, resulting in over 96% ASR with 0% refusals.

View on arXiv

Comments on this paper