ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.19532
25
2

Contrasting Multiple Representations with the Multi-Marginal Matching Gap

29 May 2024
Zoe Piran
Michal Klein
James Thornton
Marco Cuturi
ArXivPDFHTML
Abstract

Learning meaningful representations of complex objects that can be seen through multiple (k≥3k\geq 3k≥3) views or modalities is a core task in machine learning. Existing methods use losses originally intended for paired views, and extend them to kkk views, either by instantiating 12k(k−1)\tfrac12k(k-1)21​k(k−1) loss-pairs, or by using reduced embeddings, following a \textit{one vs. average-of-rest} strategy. We propose the multi-marginal matching gap (M3G), a loss that borrows tools from multi-marginal optimal transport (MM-OT) theory to simultaneously incorporate all kkk views. Given a batch of nnn points, each seen as a kkk-tuple of views subsequently transformed into kkk embeddings, our loss contrasts the cost of matching these nnn ground-truth kkk-tuples with the MM-OT polymatching cost, which seeks nnn optimally arranged kkk-tuples chosen within these n×kn\times kn×k vectors. While the exponential complexity O(nkO(n^kO(nk) of the MM-OT problem may seem daunting, we show in experiments that a suitable generalization of the Sinkhorn algorithm for that problem can scale to, e.g., k=3∼6k=3\sim 6k=3∼6 views using mini-batches of size 64 ∼12864~\sim12864 ∼128. Our experiments demonstrate improved performance over multiview extensions of pairwise losses, for both self-supervised and multimodal tasks.

View on arXiv
Comments on this paper