ColBERT-serve: Efficient Multi-Stage Memory-Mapped Scoring
Kaili Huang
Thejas Venkatesh
Uma Dingankar
Antonio Mallia
Daniel Campos
Jian Jiao
Christopher Potts
Matei A. Zaharia
Kwabena Boahen
Omar Khattab
Saarthak Sarup
Keshav Santhanam

Abstract
We study serving retrieval models, specifically late interaction models like ColBERT, to many concurrent users at once and under a small budget, in which the index may not fit in memory. We present ColBERT-serve, a novel serving system that applies a memory-mapping strategy to the ColBERT index, reducing RAM usage by 90% and permitting its deployment on cheap servers, and incorporates a multi-stage architecture with hybrid scoring, reducing ColBERT's query latency and supporting many concurrent queries in parallel.
View on arXiv@article{huang2025_2504.14903, title={ ColBERT-serve: Efficient Multi-Stage Memory-Mapped Scoring }, author={ Kaili Huang and Thejas Venkatesh and Uma Dingankar and Antonio Mallia and Daniel Campos and Jian Jiao and Christopher Potts and Matei Zaharia and Kwabena Boahen and Omar Khattab and Saarthak Sarup and Keshav Santhanam }, journal={arXiv preprint arXiv:2504.14903}, year={ 2025 } }
Comments on this paper