Contrastive learning has gained widespread adoption for retrieval tasks due to its minimal requirement for manual annotations. However, popular training frameworks typically learn from binary (positive/negative) relevance, making them ineffective at incorporating desired rankings. As a result, the poor ranking performance of these models forces systems to employ a re-ranker, which increases complexity, maintenance effort and inference time. To address this, we introduce Generalized Contrastive Learning (GCL), a training framework designed to learn from continuous ranking scores beyond binary relevance. GCL encodes both relevance and ranking information into a unified embedding space by applying ranking scores to the loss function. This enables a single-stage retrieval system. In addition, during our research, we identified a lack of public multi-modal datasets that benchmark both retrieval and ranking capabilities. To facilitate this and future research for ranked retrieval, we curated a large-scale MarqoGS-10M dataset using GPT-4 and Google Shopping, providing ranking scores for each of the 10 million query-document pairs. Our results show that GCL achieves a 29.3% increase in NDCG@10 for in-domain evaluations and 6.0% to 10.0% increases for cold-start evaluations compared to the finetuned CLIP baseline with MarqoGS-10M. Additionally, we evaluated GCL offline on a proprietary user interaction data. GCL shows an 11.2% gain for in-domain evaluations. The dataset and the method are available at:this https URL.
View on arXiv@article{zhu2025_2404.08535, title={ Generalized Contrastive Learning for Multi-Modal Retrieval and Ranking }, author={ Tianyu Zhu and Myong Chol Jung and Jesse Clark }, journal={arXiv preprint arXiv:2404.08535}, year={ 2025 } }