Dual Filter: A Mathematical Framework for Inference using Transformer-like Architectures

1 May 2025

Abstract

This paper presents a mathematical framework for causal nonlinear prediction in settings where observations are generated from an underlying hidden Markov model (HMM). Both the problem formulation and the proposed solution are motivated by the decoder-only transformer architecture, in which a finite sequence of observations (tokens) is mapped to the conditional probability of the next token. Our objective is not to construct a mathematical model of a transformer. Rather, our interest lies in deriving, from first principles, transformer-like architectures that solve the prediction problem for which the transformer is designed. The proposed framework is based on an original optimal control approach, where the prediction objective (MMSE) is reformulated as an optimal control problem. An analysis of the optimal control problem is presented leading to a fixed-point equation on the space of probability measures. To solve the fixed-point equation, we introduce the dual filter, an iterative algorithm that closely parallels the architecture of decoder-only transformers. These parallels are discussed in detail along with the relationship to prior work on mathematical modeling of transformers as transport on the space of probability measures. Numerical experiments are provided to illustrate the performance of the algorithm using parameter values used in researchscale transformer models.

View on arXiv

@article{chang2025_2505.00818,
  title={ Dual Filter: A Mathematical Framework for Inference using Transformer-like Architectures },
  author={ Heng-Sheng Chang and Prashant G. Mehta },
  journal={arXiv preprint arXiv:2505.00818},
  year={ 2025 }
}

Comments on this paper