69

Not Just How Much, But Where: Decomposing Epistemic Uncertainty into Per-Class Contributions

Mame Diarra Toure
David A. Stephens
Main:10 Pages
18 Figures
Bibliography:3 Pages
36 Tables
Appendix:25 Pages
Abstract

In safety-critical classification, the cost of failure is often asymmetric, yet Bayesian deep learning summarises epistemic uncertainty with a single scalar, mutual information (MI), that cannot distinguish whether a model's ignorance involves a benign or safety-critical class. We decompose MI into a per-class vector Ck(x)=σk2/(2μk)C_k(x)=\sigma_k^{2}/(2\mu_k), with μk=E[pk]\mu_k{=}\mathbb{E}[p_k] and σk2=Var[pk]\sigma_k^2{=}\mathrm{Var}[p_k] across posterior samples. The decomposition follows from a second-order Taylor expansion of the entropy; the 1/μk1/\mu_k weighting corrects boundary suppression and makes CkC_k comparable across rare and common classes. By construction kCkMI\sum_k C_k \approx \mathrm{MI}, and a companion skewness diagnostic flags inputs where the approximation degrades. After characterising the axiomatic properties of CkC_k, we validate it on three tasks: (i) selective prediction for diabetic retinopathy, where critical-class CkC_k reduces selective risk by 34.7\% over MI and 56.2\% over variance baselines; (ii) out-of-distribution detection on clinical and image benchmarks, where kCk\sum_k C_k achieves the highest AUROC and the per-class view exposes asymmetric shifts invisible to MI; and (iii) a controlled label-noise study in which kCk\sum_k C_k shows less sensitivity to injected aleatoric noise than MI under end-to-end Bayesian training, while both metrics degrade under transfer learning. Across all tasks, the quality of the posterior approximation shapes uncertainty at least as strongly as the choice of metric, suggesting that how uncertainty is propagated through the network matters as much as how it is measured.

View on arXiv
Comments on this paper