203

The Geometry of Mixability

Abstract

Mixable loss functions are of fundamental importance in the context of prediction with expert advice in the online setting since they characterize fast learning rates. By re-interpreting properness from the point of view of differential geometry, we provide a simple geometric characterization of mixability for the binary and multi-class cases: a proper loss function \ell is η\eta-mixable if and only if the superpredition set spr(η)\textrm{spr}(\eta \ell) of the scaled loss function η\eta \ell slides freely inside the superprediction set spr(log)\textrm{spr}(\ell_{\log}) of the log loss log\ell_{\log}, under fairly general assumptions on the differentiability of \ell. Our approach provides a way to treat some concepts concerning loss functions (like properness) in a ''coordinate-free'' manner and reconciles previous results obtained for mixable loss functions for the binary and the multi-class cases.

View on arXiv
Comments on this paper