Calibration Attention: Learning Reliability-Aware Representations for Vision Transformers
Most calibration methods operate at the logit level, implicitly assuming that miscalibration can be corrected without changing the underlying representation. We challenge this assumption and propose \textbf{Calibration Attention (CalAttn)}, a \emph{representation-aware} calibration module for vision transformers that couples instance-wise temperature scaling to transformer token geometry under a proper scoring objective. CalAttn predicts a sample-specific temperature from the \texttt{[CLS]} token and backpropagates calibration gradients into the backbone, thereby reshaping the uncertainty structure of the representation rather than post-hoc adjusting confidence. This yields \emph{token-conditioned uncertainty modulation} with negligible overhead (\(<0.1\%\) additional parameters). Across multiple datasets with ViT/DeiT/Swin backbones, CalAttn consistently improves calibration while preserving accuracy, achieving relative ECE reductions of \(3.7\%\) to \(77.7\%\) over strong baselines across diverse training objectives. Our results indicate that treating calibration as a representation-level problem is a practical and effective direction for trustworthy uncertainty estimation in transformers. Code: [this https URL](this https URL)
View on arXiv