DS SERVE: A Framework for Efficient and Scalable Neural Retrieval

17 December 2025

Jinjian Liu

Yichuan Wang

Xinxi Lyu

Rulin Shao

Joseph E. Gonzalez

Matei Zaharia

Sewon Min

AI4TS

3DV

ArXiv (abs)PDF HTML Github (45★)

Main:2 Pages

1 Figures

Bibliography:1 Pages

1 Tables

Abstract

We present DS-Serve, a framework that transforms large-scale text datasets, comprising half a trillion tokens, into a high-performance neural retrieval system. DS-Serve offers both a web interface and API endpoints, achieving low latency with modest memory overhead on a single node. The framework also supports inference-time trade-offs between latency, accuracy, and result diversity. We anticipate that DS-Serve will be broadly useful for a range of applications, including large-scale retrieval-augmented generation (RAG), training data attribution, training search agents, and beyond.

View on arXiv

Comments on this paper