136
v1v2 (latest)

LLMs are Bayesian, In Expectation, Not in Realization

Leon Chlon
Zein Khamis
Maggie Chlon
Mahdi El Zein
MarcAntonio M. Awada
Main:7 Pages
4 Figures
Bibliography:1 Pages
5 Tables
Appendix:4 Pages
Abstract

Exchangeability-based martingale diagnostics have been used to question Bayesian explanations of transformer in-context learning. We show that these violations are compatible with Bayesian/MDL behavior once we account for a basic architectural fact: positional encodings break exchangeability. Accordingly, the relevant baseline is performance in expectation over orderings of an exchangeable multiset, not performance under every fixed ordering.In a Bernoulli microscope (under explicit regularity assumptions), we bound the permutation-induced dispersion detected by martingale diagnostics (Theorem~3.4) while proving near-optimal expected MDL/compression over permutations (Theorem~3.6). Empirically, black-box next-token log-probabilities from an Azure OpenAI deployment exhibit nonzero expectation--realization gaps that decay with context length (mean 0.74 at n=10n = 10 to 0.26 at n=50n = 50; 95\% confidence intervals), and permutation averaging reduces order-induced standard deviation with a k1/2k^{-1/2} trend (Figure~2).Controlled from-scratch training ablations varying only the positional encoding show within-prefix order variance collapsing to 1016\approx 10^{-16} with no positional encoding, but remaining 10810^{-8}--10610^{-6} under standard positional encoding schemes (Table~2). Robustness checks extend beyond Bernoulli to categorical sequences, synthetic in-context learning tasks, and evidence-grounded QA with permuted exchangeable evidence chunks.

View on arXiv
Comments on this paper