30
0

ColBERT-serve: Efficient Multi-Stage Memory-Mapped Scoring

Abstract

We study serving retrieval models, specifically late interaction models like ColBERT, to many concurrent users at once and under a small budget, in which the index may not fit in memory. We present ColBERT-serve, a novel serving system that applies a memory-mapping strategy to the ColBERT index, reducing RAM usage by 90% and permitting its deployment on cheap servers, and incorporates a multi-stage architecture with hybrid scoring, reducing ColBERT's query latency and supporting many concurrent queries in parallel.

View on arXiv
@article{huang2025_2504.14903,
  title={ ColBERT-serve: Efficient Multi-Stage Memory-Mapped Scoring },
  author={ Kaili Huang and Thejas Venkatesh and Uma Dingankar and Antonio Mallia and Daniel Campos and Jian Jiao and Christopher Potts and Matei Zaharia and Kwabena Boahen and Omar Khattab and Saarthak Sarup and Keshav Santhanam },
  journal={arXiv preprint arXiv:2504.14903},
  year={ 2025 }
}
Comments on this paper