ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.04953
37
26

The Monge Gap: A Regularizer to Learn All Transport Maps

9 February 2023
Théo Uscidda
Marco Cuturi
    OT
ArXivPDFHTML
Abstract

Optimal transport (OT) theory has been been used in machine learning to study and characterize maps that can push-forward efficiently a probability measure onto another. Recent works have drawn inspiration from Brenier's theorem, which states that when the ground cost is the squared-Euclidean distance, the ``best'' map to morph a continuous measure in P(\Rd)\mathcal{P}(\Rd)P(\Rd) into another must be the gradient of a convex function. To exploit that result, [Makkuva+ 2020, Korotin+2020] consider maps T=∇fθT=\nabla f_\thetaT=∇fθ​, where fθf_\thetafθ​ is an input convex neural network (ICNN), as defined by Amos+2017, and fit θ\thetaθ with SGD using samples. Despite their mathematical elegance, fitting OT maps with ICNNs raises many challenges, due notably to the many constraints imposed on θ\thetaθ; the need to approximate the conjugate of fθf_\thetafθ​; or the limitation that they only work for the squared-Euclidean cost. More generally, we question the relevance of using Brenier's result, which only applies to densities, to constrain the architecture of candidate maps fitted on samples. Motivated by these limitations, we propose a radically different approach to estimating OT maps: Given a cost ccc and a reference measure ρ\rhoρ, we introduce a regularizer, the Monge gap Mρc(T)\mathcal{M}^c_{\rho}(T)Mρc​(T) of a map TTT. That gap quantifies how far a map TTT deviates from the ideal properties we expect from a ccc-OT map. In practice, we drop all architecture requirements for TTT and simply minimize a distance (e.g., the Sinkhorn divergence) between T♯μT\sharp\muT♯μ and ν\nuν, regularized by Mρc(T)\mathcal{M}^c_\rho(T)Mρc​(T). We study Mρc\mathcal{M}^c_{\rho}Mρc​, and show how our simple pipeline outperforms significantly other baselines in practice.

View on arXiv
Comments on this paper