317
v1v2 (latest)

Bag of Coins: A Statistical Probe into Neural Confidence Structures

Main:8 Pages
11 Figures
Bibliography:1 Pages
6 Tables
Appendix:7 Pages
Abstract

Modern neural networks often produce miscalibrated confidence scores and struggle to detect out-of-distribution (OOD) inputs, while most existing methods post-process outputs without testing internal consistency. We introduce the Bag-of-Coins (BoC) probe, a non-parametric diagnostic of logit coherence that compares softmax confidence p^\hat p to an aggregate of pairwise Luce-style dominance probabilities qˉ\bar q, yielding a deterministic coherence score and a p-value-based structural score. Across ViT, ResNet, and RoBERTa with ID/OOD test sets, the coherence gap Δ=qˉp^\Delta=\bar q-\hat p reveals clear ID/OOD separation for ViT (ID 0.1{\sim}0.1-0.20.2, OOD 0.5{\sim}0.5-0.60.6) but substantial overlap for ResNet and RoBERTa (both 0{\sim}0), indicating architecture-dependent uncertainty geometry. As a practical method, BoC improves calibration only when the base model is poorly calibrated (ViT: ECE 0.0240.024 vs.\ 0.1800.180) and underperforms standard calibrators (ECE 0.005{\sim}0.005), while for OOD detection it fails across architectures (AUROC 0.0200.020-0.2530.253) compared to standard scores (0.750.75-0.990.99). We position BoC as a research diagnostic for interrogating how architectures encode uncertainty in logit geometry rather than a production calibration or OOD detection method.

View on arXiv
Comments on this paper