What can we learn from signals and systems in a transformer? Insights for probabilistic modeling and inference architecture

27 August 2025

Heng-Sheng Chang

P. Mehta

AI4TS

ArXiv (abs)PDF HTML

Main:20 Pages

7 Figures

Bibliography:1 Pages

3 Tables

Abstract

In the 1940s, Wiener introduced a linear predictor, where the future prediction is computed by linearly combining the past data. A transformer generalizes this idea: it is a nonlinear predictor where the next-token prediction is computed by nonlinearly combining the past tokens. In this essay, we present a probabilistic model that interprets transformer signals as surrogates of conditional measures, and layer operations as fixed-point updates. An explicit form of the fixed-point update is described for the special case when the probabilistic model is a hidden Markov model (HMM). In part, this paper is in an attempt to bridge the classical nonlinear filtering theory with modern inference architectures.

View on arXiv

Comments on this paper