Reducing Over-confident Errors outside the Known Distribution
Intuitively, unfamiliarity should lead to lack of confidence. In reality, current algorithms often make highly confident yet wrong predictions when faced with unexpected test samples from an unknown distribution different from training. Unlike domain adaptation methods, we cannot gather an "unexpected dataset" prior to test, and unlike novelty detection methods, the input examples have labels within the label domain, and a best-effort prediction is required. We compare a number of methods from related fields such as calibration and epistemic uncertainty modeling. Two proposed variants that reduce overconfident errors on an unknown novel distribution are also compared: (1) -distillation, training an ensemble of classifiers and then distilling into a single model using both labeled and unlabeled examples, and (2) NCR, reducing the confidence of a prediction based on the estimated novelty of the example. Experimentally, we investigate the overconfidence problem and evaluate methods by creating "familiar" and "novel" test splits, where "familiar" are identically distributed with training and "novel" are not. We discover that, surprisingly given its simplicity, calibrating using temperature scaling on only familiar data is the best single-model method for improving image confidence, followed by our proposed methods, then by other methods that may take unknown distributions into account. The top performing methods can also reduce confident errors, e.g. by 95\% in recognizing gender in demographic groups different from the training data.
View on arXiv