Cost-Effective, Low Latency Vector Search with Azure Cosmos DB

Abstract

Vector indexing enables semantic search over diverse corpora and has become an important interface to databases for both users and AI agents. Efficient vector search requires deep optimizations in database systems. This has motivated a new class of specialized vector databases that optimize for vector search quality and cost. Instead, we argue that a scalable, high-performance, and cost-efficient vector search system can be built inside a cloud-native operational database like Azure Cosmos DB while leveraging the benefits of a distributed database such as high availability, durability, and scale. We do this by deeply integrating DiskANN, a state-of-the-art vector indexing library, inside Azure Cosmos DB NoSQL. This system uses a single vector index per partition stored in existing index trees, and kept in sync with underlying data. It supports < 20ms query latency over an index spanning 10 million of vectors, has stable recall over updates, and offers nearly 15x and 41x lower query cost compared to Zilliz and Pinecone serverless enterprise products. It also scales out to billions of vectors via automatic partitioning. This convergent design presents a point in favor of integrating vector indices into operational databases in the context of recent debates on specialized vector databases, and offers a template for vector indexing in other databases.

View on arXiv

@article{upreti2025_2505.05885,
  title={ Cost-Effective, Low Latency Vector Search with Azure Cosmos DB },
  author={ Nitish Upreti and Krishnan Sundaram and Hari Sudan Sundar and Samer Boshra and Balachandar Perumalswamy and Shivam Atri and Martin Chisholm and Revti Raman Singh and Greg Yang and Subramanyam Pattipaka and Tamara Hass and Nitesh Dudhey and James Codella and Mark Hildebrand and Magdalen Manohar and Jack Moffitt and Haiyang Xu and Naren Datha and Suryansh Gupta and Ravishankar Krishnaswamy and Prashant Gupta and Abhishek Sahu and Ritika Mor and Santosh Kulkarni and Hemeswari Varada and Sudhanshu Barthwal and Amar Sagare and Dinesh Billa and Zishan Fu and Neil Deshpande and Shaun Cooper and Kevin Pilch and Simon Moreno and Aayush Kataria and Vipul Vishal and Harsha Vardhan Simhadri },
  journal={arXiv preprint arXiv:2505.05885},
  year={ 2025 }
}

Comments on this paper