ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2511.00903
24
0

ColMate: Contrastive Late Interaction and Masked Text for Multimodal Document Retrieval

2 November 2025
Ahmed Masry
Megh Thakkar
Patrice Bechard
Sathwik Tejaswi Madhusudhan
Rabiul Awal
Shambhavi Mishra
Akshay Kalkunte Suresh
Srivatsava Daruru
Enamul Hoque
Spandana Gella
Torsten Scholak
Sai Rajeswar
    VLM
ArXiv (abs)PDFHTML
Main:6 Pages
2 Figures
Bibliography:3 Pages
4 Tables
Appendix:1 Pages
Abstract

Retrieval-augmented generation has proven practical when models require specialized knowledge or access to the latest data. However, existing methods for multimodal document retrieval often replicate techniques developed for text-only retrieval, whether in how they encode documents, define training objectives, or compute similarity scores. To address these limitations, we present ColMate, a document retrieval model that bridges the gap between multimodal representation learning and document retrieval. ColMate utilizes a novel OCR-based pretraining objective, a self-supervised masked contrastive learning objective, and a late interaction scoring mechanism more relevant to multimodal document structures and visual characteristics. ColMate obtains 3.61% improvements over existing retrieval models on the ViDoRe V2 benchmark, demonstrating stronger generalization to out-of-domain benchmarks.

View on arXiv
Comments on this paper