Provably Overwhelming Transformer Models with Designed Inputs

We develop an algorithm which, given a trained transformer model as input, as well as a string of tokens of length and an integer , can generate a mathematical proof that is ``overwhelmed'' by , in time and space . We say that is ``overwhelmed'' by when the output of the model evaluated on this string plus any additional string , , is completely insensitive to the value of the string whenever length() . Along the way, we prove a particularly strong worst-case form of ``over-squashing'', which we use to bound the model's behavior. Our technique uses computer-aided proofs to establish this type of operationally relevant guarantee about transformer models. We empirically test our algorithm on a single layer transformer complete with an attention head, layer-norm, MLP/ReLU layers, and RoPE positional encoding. We believe that this work is a stepping stone towards the difficult task of obtaining useful guarantees for trained transformer models.
View on arXiv@article{stambler2025_2502.06038, title={ Provably Overwhelming Transformer Models with Designed Inputs }, author={ Lev Stambler and Seyed Sajjad Nezhadi and Matthew Coudron }, journal={arXiv preprint arXiv:2502.06038}, year={ 2025 } }