76
6

Moreau-Yosida ff-divergences

Abstract

Variational representations of ff-divergences are central to many machine learning algorithms, with Lipschitz constrained variants recently gaining attention. Inspired by this, we generalize the so-called tight variational representation of ff-divergences in the case of probability measures on compact metric spaces to be taken over the space of Lipschitz functions vanishing at an arbitrary base point, characterize functions achieving the supremum in the variational representation, propose a practical algorithm to calculate the tight convex conjugate of ff-divergences compatible with automatic differentiation frameworks, define the Moreau-Yosida approximation of ff-divergences with respect to the Wasserstein-11 metric, and derive the corresponding variational formulas, providing a generalization of a number of recent results, novel special cases of interest and a relaxation of the hard Lipschitz constraint. As an application of our theoretical results, we propose the Moreau-Yosida ff-GAN, providing an implementation of the variational formulas for the Kullback-Leibler, reverse Kullback-Leibler, χ2\chi^2, reverse χ2\chi^2, squared Hellinger, Jensen-Shannon, Jeffreys, triangular discrimination and total variation divergences as GANs trained on CIFAR-10, leading to competitive results and a simple solution to the problem of uniqueness of the optimal critic.

View on arXiv
Comments on this paper