ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.17788
47
0

WARP: An Efficient Engine for Multi-Vector Retrieval

29 January 2025
Jan Luca Scheerer
Matei A. Zaharia
Christopher Potts
Gustavo Alonso
Omar Khattab
ArXivPDFHTML
Abstract

Multi-vector retrieval methods such as ColBERT and its recent variant, the ConteXtualized Token Retriever (XTR), offer high accuracy but face efficiency challenges at scale. To address this, we present WARP, a retrieval engine that substantially improves the efficiency of retrievers trained with the XTR objective through three key innovations: (1) WARPSELECT_\text{SELECT}SELECT​ for dynamic similarity imputation; (2) implicit decompression, avoiding costly vector reconstruction during retrieval; and (3) a two-stage reduction process for efficient score aggregation. Combined with highly-optimized C++ kernels, our system reduces end-to-end latency compared to XTR's reference implementation by 41x, and achieves a 3x speedup over the ColBERTv2/PLAID engine, while preserving retrieval quality.

View on arXiv
@article{scheerer2025_2501.17788,
  title={ WARP: An Efficient Engine for Multi-Vector Retrieval },
  author={ Jan Luca Scheerer and Matei Zaharia and Christopher Potts and Gustavo Alonso and Omar Khattab },
  journal={arXiv preprint arXiv:2501.17788},
  year={ 2025 }
}
Comments on this paper