231
v1v2 (latest)

Moreau-Yosida ff-divergences

International Conference on Machine Learning (ICML), 2021
Abstract

Variational representations of ff-divergences are central to many machine learning algorithms, with Lipschitz constrained variants recently gaining attention. Inspired by this, we define the Moreau-Yosida approximation of ff-divergences with respect to the Wasserstein-11 metric. The corresponding variational formulas provide a generalization of a number of recent results, novel special cases of interest and a relaxation of the hard Lipschitz constraint. Additionally, we prove that the so-called tight variational representation of ff-divergences can be to be taken over the quotient space of Lipschitz functions, and give a characterization of functions achieving the supremum in the variational representation. On the practical side, we propose an algorithm to calculate the tight convex conjugate of ff-divergences compatible with automatic differentiation frameworks. As an application of our results, we propose the Moreau-Yosida ff-GAN, providing an implementation of the variational formulas for the Kullback-Leibler, reverse Kullback-Leibler, χ2\chi^2, reverse χ2\chi^2, squared Hellinger, Jensen-Shannon, Jeffreys, triangular discrimination and total variation divergences as GANs trained on CIFAR-10, leading to competitive results and a simple solution to the problem of uniqueness of the optimal critic.

View on arXiv
Comments on this paper