20
7

Mathematical Models of Computation in Superposition

Abstract

Superposition -- when a neural network represents more ``features'' than it has dimensions -- seems to pose a serious challenge to mechanistically interpreting current AI systems. Existing theory work studies \emph{representational} superposition, where superposition is only used when passing information through bottlenecks. In this work, we present mathematical models of \emph{computation} in superposition, where superposition is actively helpful for efficiently accomplishing the task. We first construct a task of efficiently emulating a circuit that takes the AND of the (m2)\binom{m}{2} pairs of each of mm features. We construct a 1-layer MLP that uses superposition to perform this task up to ε\varepsilon-error, where the network only requires O~(m23)\tilde{O}(m^{\frac{2}{3}}) neurons, even when the input features are \emph{themselves in superposition}. We generalize this construction to arbitrary sparse boolean circuits of low depth, and then construct ``error correction'' layers that allow deep fully-connected networks of width dd to emulate circuits of width O~(d1.5)\tilde{O}(d^{1.5}) and \emph{any} polynomial depth. We conclude by providing some potential applications of our work for interpreting neural networks that implement computation in superposition.

View on arXiv
Comments on this paper