In this paper, we propose Wasserstein proximals of -divergences as suitable objective functionals for learning heavy-tailed distributions in a stable manner. First, we provide sufficient, and in some cases necessary, relations among data dimension, , and the decay rate of data distributions for the Wasserstein-proximal-regularized divergence to be finite. Finite-sample convergence rates for the estimation in the case of the Wasserstein-1 proximal divergences are then provided under certain tail conditions. Numerical experiments demonstrate stable learning of heavy-tailed distributions -- even those without first or second moment -- without any explicit knowledge of the tail behavior, using suitable generative models such as GANs and flow-based models related to our proposed Wasserstein-proximal-regularized -divergences. Heuristically, -divergences handle the heavy tails and Wasserstein proximals allow non-absolute continuity between distributions and control the velocities of flow-based algorithms as they learn the target distribution deep into the tails.
View on arXiv