v1v2 (latest)

Bag of Coins: A Statistical Probe into Neural Confidence Structures

26 July 2025

ArXiv (abs)PDF HTML Github

Main:8 Pages

11 Figures

Bibliography:1 Pages

6 Tables

Appendix:7 Pages

Abstract

Modern neural networks often produce miscalibrated confidence scores and struggle to detect out-of-distribution (OOD) inputs, while most existing methods post-process outputs without testing internal consistency. We introduce the Bag-of-Coins (BoC) probe, a non-parametric diagnostic of logit coherence that compares softmax confidence $\hat p$ to an aggregate of pairwise Luce-style dominance probabilities $\bar q$ , yielding a deterministic coherence score and a p-value-based structural score. Across ViT, ResNet, and RoBERTa with ID/OOD test sets, the coherence gap $\Delta=\bar q-\hat p$ reveals clear ID/OOD separation for ViT (ID ${\sim}0.1$ - $0.2$ , OOD ${\sim}0.5$ - $0.6$ ) but substantial overlap for ResNet and RoBERTa (both ${\sim}0$ ), indicating architecture-dependent uncertainty geometry. As a practical method, BoC improves calibration only when the base model is poorly calibrated (ViT: ECE $0.024$ vs.\ $0.180$ ) and underperforms standard calibrators (ECE ${\sim}0.005$ ), while for OOD detection it fails across architectures (AUROC $0.020$ - $0.253$ ) compared to standard scores ( $0.75$ - $0.99$ ). We position BoC as a research diagnostic for interrogating how architectures encode uncertainty in logit geometry rather than a production calibration or OOD detection method.

View on arXiv

Comments on this paper