Arctic-Embed: Scalable, Efficient, and Accurate Text Embedding Models

8 May 2024

ArXiv (abs)PDF HTML HuggingFace (2 upvotes)

Papers citing "Arctic-Embed: Scalable, Efficient, and Accurate Text Embedding Models"

39 / 39 papers shown

From Topology to Retrieval: Decoding Embedding Spaces with Unified Signatures

140

27 Nov 2025

Improving Romanian LLM Pretraining Data using Diversity and Quality Filtering

Vlad Negoita

Mihai Masala

Traian Rebedea

123

02 Nov 2025

GigaEmbeddings: Efficient Russian Language Embedding Model

124

25 Oct 2025

CoRECT: A Framework for Evaluating Embedding Compression Techniques at Scale

148

22 Oct 2025

MOSAIC: Masked Objective with Selective Adaptation for In-domain Contrastive Learning

Vera Pavlova

Mohammed Makhlouf

CLL

152

19 Oct 2025

Fantastic (small) Retrievers and How to Train Them: mxbai-edge-colbert-v0 Tech Report

106

16 Oct 2025

DMRetriever: A Family of Models for Improved Text Retrieval in Disaster Management

133

16 Oct 2025

SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models

184

08 Oct 2025

Compressed Concatenation of Small Embedding Models

M. Ayoub Ben Ayad

Michael Dinzinger

Kanishka Ghosh Dastidar

Jelena Mitrović

Michael Granitzer

104

06 Oct 2025

The Data-Quality Illusion: Rethinking Classifier-Based Quality Filtering for LLM Pretraining

145

01 Oct 2025

LEAF: Knowledge Distillation of Text Embedding Models with Teacher-Aligned Representations

Robin Vujanic

Thomas Rueckstiess

116

16 Sep 2025

How to Evaluate Medical AI

215

15 Sep 2025

Boosting Data Utilization for Multilingual Dense Retrieval

141

11 Sep 2025

Chronological Passage Assembling in RAG framework for Temporal Question Answering

107

26 Aug 2025

Retrieval Capabilities of Large Language Models Scale with Pretraining FLOPs

140

24 Aug 2025

On The Role of Pretrained Language Models in General-Purpose Text Embeddings: A Survey

255

28 Jul 2025

DeRAG: Black-box Adversarial Attacks on Multiple Retrieval-Augmented Generation Applications via Prompt Injection

Jerry Wang

Fang Yu

SILM AAML

109

20 Jul 2025

FlexOlmo: Open Language Models for Flexible Data Use

...

372

09 Jul 2025

Conventional Contrastive Learning Often Falls Short: Improving Dense Retrieval with Cross-Encoder Listwise Distillation and Synthetic Data

223

25 May 2025

S-DAT: A Multilingual, GenAI-Driven Framework for Automated Divergent Thinking Assessment

332

14 May 2025

SweRank: Software Issue Localization with Code Ranking

274

07 May 2025

Safety Pretraining: Toward the Next Generation of Safe AI

495

23 Apr 2025

Teaching Dense Retrieval Models to Specialize with Listwise Distillation and LLM Data Augmentation

311

27 Feb 2025

GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search

Matan Ben-Tov

Mahmood Sharif

RALM

525

30 Dec 2024

CoRNStack: High-Quality Contrastive Data for Better Code Retrieval and RerankingInternational Conference on Learning Representations (ICLR), 2024

493

01 Dec 2024

Model Editing for LLMs4Code: How Far are We?International Conference on Software Engineering (ICSE), 2024

289

11 Nov 2024

Qtok: A Comprehensive Framework for Evaluating Multilingual Tokenizer Quality in Large Language Models

Iaroslav Chelombitko

Egor Safronov

Aleksey Komissarov

205

16 Oct 2024

REFINE on Scarce Data: Retrieval Enhancement through Fine-Tuning via Model Fusion of Embedding ModelsApplied Informatics (AI), 2024

195

16 Oct 2024

EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference

318

16 Oct 2024

Efficient Pretraining Data Selection for Language Models via Multi-Actor CollaborationAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

...

389

10 Oct 2024

IRSC: A Zero-shot Evaluation Benchmark for Information Retrieval through Semantic Comprehension in Retrieval-Augmented Generation Scenarios

206

24 Sep 2024

Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language ModelsInternational Conference on Learning Representations (ICLR), 2024

202

17 Sep 2024

Enhancing Q&A Text Retrieval with Ranking Models: Benchmarking, fine-tuning and deploying Rerankers for RAG

Gabriel de Souza P. Moreira

189

12 Sep 2024

The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model designNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

Alena Fenogenova

438

22 Aug 2024

NV-Retriever: Improving text embedding models with effective hard-negative mining

Gabriel de Souza P. Moreira

Mengyao Xu

343

22 Jul 2024

The 2024 Foundation Model Transparency Index

319

17 Jul 2024

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

378

551

25 Jun 2024

Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track

Ronak Pradeep

Nandan Thakur

Sahel Sharifymoghaddam

Eric Zhang

Ryan Nguyen

Daniel Campos

Nick Craswell

Jimmy Lin

286

24 Jun 2024

Can't Hide Behind the API: Stealing Black-Box Commercial Embedding Models

667

13 Jun 2024