13
27

A Tunable Loss Function for Robust Classification: Calibration, Landscape, and Generalization

Abstract

We introduce a tunable loss function called α\alpha-loss, parameterized by α(0,]\alpha \in (0,\infty], which interpolates between the exponential loss (α=1/2\alpha = 1/2), the log-loss (α=1\alpha = 1), and the 0-1 loss (α=\alpha = \infty), for the machine learning setting of classification. Theoretically, we illustrate a fundamental connection between α\alpha-loss and Arimoto conditional entropy, verify the classification-calibration of α\alpha-loss in order to demonstrate asymptotic optimality via Rademacher complexity generalization techniques, and build-upon a notion called strictly local quasi-convexity in order to quantitatively characterize the optimization landscape of α\alpha-loss. Practically, we perform class imbalance, robustness, and classification experiments on benchmark image datasets using convolutional-neural-networks. Our main practical conclusion is that certain tasks may benefit from tuning α\alpha-loss away from log-loss (α=1\alpha = 1), and to this end we provide simple heuristics for the practitioner. In particular, navigating the α\alpha hyperparameter can readily provide superior model robustness to label flips (α>1\alpha > 1) and sensitivity to imbalanced classes (α<1\alpha < 1).

View on arXiv
Comments on this paper