ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.01613
8
90

Nomic Embed: Training a Reproducible Long Context Text Embedder

2 February 2024
Zach Nussbaum
John X. Morris
Brandon Duderstadt
Andriy Mulyar
ArXivPDFHTML
Abstract

This technical report describes the training of nomic-embed-text-v1, the first fully reproducible, open-source, open-weights, open-data, 8192 context length English text embedding model that outperforms both OpenAI Ada-002 and OpenAI text-embedding-3-small on the short-context MTEB benchmark and the long context LoCo benchmark. We release the training code and model weights under an Apache 2.0 license. In contrast with other open-source models, we release the full curated training data and code that allows for full replication of nomic-embed-text-v1. You can find code and data to replicate the model atthis https URL.

View on arXiv
@article{nussbaum2025_2402.01613,
  title={ Nomic Embed: Training a Reproducible Long Context Text Embedder },
  author={ Zach Nussbaum and John X. Morris and Brandon Duderstadt and Andriy Mulyar },
  journal={arXiv preprint arXiv:2402.01613},
  year={ 2025 }
}
Comments on this paper