ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.05374
  4. Cited By
Arctic-Embed: Scalable, Efficient, and Accurate Text Embedding Models

Arctic-Embed: Scalable, Efficient, and Accurate Text Embedding Models

8 May 2024
Luke Merrick
Danmei Xu
Gaurav Nuti
Daniel Campos
ArXiv (abs)PDFHTMLHuggingFace (2 upvotes)

Papers citing "Arctic-Embed: Scalable, Efficient, and Accurate Text Embedding Models"

39 / 39 papers shown
From Topology to Retrieval: Decoding Embedding Spaces with Unified Signatures
From Topology to Retrieval: Decoding Embedding Spaces with Unified Signatures
Florian Rottach
William Rudman
Bastian Rieck
Harrisen Scells
Carsten Eickhoff
140
0
0
27 Nov 2025
Improving Romanian LLM Pretraining Data using Diversity and Quality Filtering
Improving Romanian LLM Pretraining Data using Diversity and Quality Filtering
Vlad Negoita
Mihai Masala
Traian Rebedea
123
0
0
02 Nov 2025
GigaEmbeddings: Efficient Russian Language Embedding Model
GigaEmbeddings: Efficient Russian Language Embedding Model
Egor Kolodin
Daria Khomich
Nikita Savushkin
Anastasia Ianina
Fyodor Minkin
124
0
0
25 Oct 2025
CoRECT: A Framework for Evaluating Embedding Compression Techniques at Scale
CoRECT: A Framework for Evaluating Embedding Compression Techniques at Scale
L. Caspari
M. Dinzinger
K. Ghosh Dastidar
C. Fellicious
J. Mitrović
M. Granitzer
148
0
0
22 Oct 2025
MOSAIC: Masked Objective with Selective Adaptation for In-domain Contrastive Learning
MOSAIC: Masked Objective with Selective Adaptation for In-domain Contrastive Learning
Vera Pavlova
Mohammed Makhlouf
CLL
152
0
0
19 Oct 2025
Fantastic (small) Retrievers and How to Train Them: mxbai-edge-colbert-v0 Tech Report
Fantastic (small) Retrievers and How to Train Them: mxbai-edge-colbert-v0 Tech Report
Rikiya Takehi
Benjamin Clavié
Sean Lee
Aamir Shakir
VLM
106
1
0
16 Oct 2025
DMRetriever: A Family of Models for Improved Text Retrieval in Disaster Management
DMRetriever: A Family of Models for Improved Text Retrieval in Disaster Management
Kai Yin
Xiangjue Dong
Chengkai Liu
Allen Lin
Lingfeng Shi
Ali Mostafavi
James Caverlee
VLM
133
0
0
16 Oct 2025
SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models
SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models
Cheng-Han Chiang
Xiaofei Wang
Linjie Li
Chung-Ching Lin
Kevin Qinghong Lin
Shujie Liu
Zhendong Wang
Zhengyuan Yang
Hung-yi Lee
Lijuan Wang
LLMAGReLMRALMLRM
184
3
0
08 Oct 2025
Compressed Concatenation of Small Embedding Models
Compressed Concatenation of Small Embedding Models
M. Ayoub Ben Ayad
Michael Dinzinger
Kanishka Ghosh Dastidar
Jelena Mitrović
Michael Granitzer
104
0
0
06 Oct 2025
The Data-Quality Illusion: Rethinking Classifier-Based Quality Filtering for LLM Pretraining
The Data-Quality Illusion: Rethinking Classifier-Based Quality Filtering for LLM Pretraining
Thiziri Nait Saada
Louis Béthune
Michal Klein
David Grangier
Marco Cuturi
Pierre Ablin
145
1
0
01 Oct 2025
LEAF: Knowledge Distillation of Text Embedding Models with Teacher-Aligned Representations
LEAF: Knowledge Distillation of Text Embedding Models with Teacher-Aligned Representations
Robin Vujanic
Thomas Rueckstiess
116
2
0
16 Sep 2025
How to Evaluate Medical AI
How to Evaluate Medical AI
Ilia Kopanichuk
Petr Anokhin
V. Shaposhnikov
Vladimir Makharev
Ekaterina Tsapieva
Iaroslav Bespalov
Dmitry V. Dylov
Ivan Oseledets
ELM
215
1
0
15 Sep 2025
Boosting Data Utilization for Multilingual Dense Retrieval
Boosting Data Utilization for Multilingual Dense Retrieval
Chao Huang
Fengran Mo
Yufeng Chen
Changhao Guan
Zhenrui Yue
Xinyu Wang
Jinan Xu
Kaiyu Huang
141
2
0
11 Sep 2025
Chronological Passage Assembling in RAG framework for Temporal Question Answering
Chronological Passage Assembling in RAG framework for Temporal Question Answering
Byeongjeong Kim
Jeonghyun Park
Joonho Yang
Hwanhee Lee
107
2
0
26 Aug 2025
Retrieval Capabilities of Large Language Models Scale with Pretraining FLOPs
Retrieval Capabilities of Large Language Models Scale with Pretraining FLOPs
Jacob P. Portes
Connor Jennings
Erica Ji Yuen
Sasha Doubov
Michael Carbin
RALMLRMELM
140
1
0
24 Aug 2025
On The Role of Pretrained Language Models in General-Purpose Text Embeddings: A Survey
On The Role of Pretrained Language Models in General-Purpose Text Embeddings: A Survey
Meishan Zhang
Xin Zhang
X. Zhao
Shouzheng Huang
Baotian Hu
Min Zhang
255
3
0
28 Jul 2025
DeRAG: Black-box Adversarial Attacks on Multiple Retrieval-Augmented Generation Applications via Prompt Injection
DeRAG: Black-box Adversarial Attacks on Multiple Retrieval-Augmented Generation Applications via Prompt Injection
Jerry Wang
Fang Yu
SILMAAML
109
1
0
20 Jul 2025
FlexOlmo: Open Language Models for Flexible Data Use
FlexOlmo: Open Language Models for Flexible Data Use
Weijia Shi
Akshita Bhagia
Kevin Farhat
Niklas Muennighoff
Pete Walsh
...
Luke Zettlemoyer
Pang Wei Koh
Hannaneh Hajishirzi
Ali Farhadi
Sewon Min
MoE
372
4
0
09 Jul 2025
Conventional Contrastive Learning Often Falls Short: Improving Dense Retrieval with Cross-Encoder Listwise Distillation and Synthetic Data
Conventional Contrastive Learning Often Falls Short: Improving Dense Retrieval with Cross-Encoder Listwise Distillation and Synthetic Data
Manveer Singh Tamber
Suleman Kazi
Vivek Sourabh
Jimmy Lin
223
1
0
25 May 2025
S-DAT: A Multilingual, GenAI-Driven Framework for Automated Divergent Thinking Assessment
S-DAT: A Multilingual, GenAI-Driven Framework for Automated Divergent Thinking Assessment
J. Haase
P. Hanel
Sebastian Pokutta
LRM
332
4
0
14 May 2025
SweRank: Software Issue Localization with Code Ranking
SweRank: Software Issue Localization with Code Ranking
R. Reddy
Tarun Suresh
JaeHyeok Doo
Wenshu Fan
Xuan-Phi Nguyen
Yingbo Zhou
Semih Yavuz
Caiming Xiong
Heng Ji
Shafiq Joty
274
9
0
07 May 2025
Safety Pretraining: Toward the Next Generation of Safe AI
Safety Pretraining: Toward the Next Generation of Safe AI
Pratyush Maini
Sachin Goyal
Dylan Sam
Alex Robey
Yash Savani
Yiding Jiang
Andy Zou
Zacharcy C. Lipton
J. Zico Kolter
J. Zico Kolter
495
17
0
23 Apr 2025
Teaching Dense Retrieval Models to Specialize with Listwise Distillation and LLM Data Augmentation
Teaching Dense Retrieval Models to Specialize with Listwise Distillation and LLM Data Augmentation
Manveer Singh Tamber
Suleman Kazi
Vivek Sourabh
Jimmy J. Lin
311
2
0
27 Feb 2025
GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search
GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search
Matan Ben-Tov
Mahmood Sharif
RALM
525
4
0
30 Dec 2024
CoRNStack: High-Quality Contrastive Data for Better Code Retrieval and Reranking
CoRNStack: High-Quality Contrastive Data for Better Code Retrieval and RerankingInternational Conference on Learning Representations (ICLR), 2024
Tarun Suresh
R. Reddy
Yifei Xu
Zach Nussbaum
Andriy Mulyar
Brandon Duderstadt
Heng Ji
493
1
0
01 Dec 2024
Model Editing for LLMs4Code: How Far are We?
Model Editing for LLMs4Code: How Far are We?International Conference on Software Engineering (ICSE), 2024
Xiaopeng Li
Shasha Li
Shan Zhao
Jun Ma
Jie Yu
Xiaodong Liu
Jing Wang
Shezheng Song
Weimin Zhang
KELM
289
9
0
11 Nov 2024
Qtok: A Comprehensive Framework for Evaluating Multilingual Tokenizer
  Quality in Large Language Models
Qtok: A Comprehensive Framework for Evaluating Multilingual Tokenizer Quality in Large Language Models
Iaroslav Chelombitko
Egor Safronov
Aleksey Komissarov
205
3
0
16 Oct 2024
REFINE on Scarce Data: Retrieval Enhancement through Fine-Tuning via
  Model Fusion of Embedding Models
REFINE on Scarce Data: Retrieval Enhancement through Fine-Tuning via Model Fusion of Embedding ModelsApplied Informatics (AI), 2024
Ambuje Gupta
Mrinal Rawat
Andreas Stolcke
Roberto Pieraccini
RALM
195
1
0
16 Oct 2024
EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference
EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference
Yulei Qian
Fengcun Li
Xiangyang Ji
Xiaoyu Zhao
Jianchao Tan
Jianchao Tan
Xunliang Cai
MoE
318
8
0
16 Oct 2024
Efficient Pretraining Data Selection for Language Models via Multi-Actor Collaboration
Efficient Pretraining Data Selection for Language Models via Multi-Actor CollaborationAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Tianyi Bai
Ling Yang
Zhen Hao Wong
Fupeng Sun
Jiahui Peng
...
Lijun Wu
Jiantao Qiu
Wentao Zhang
Binhang Yuan
Conghui He
LLMAG
389
6
0
10 Oct 2024
IRSC: A Zero-shot Evaluation Benchmark for Information Retrieval through
  Semantic Comprehension in Retrieval-Augmented Generation Scenarios
IRSC: A Zero-shot Evaluation Benchmark for Information Retrieval through Semantic Comprehension in Retrieval-Augmented Generation Scenarios
Hai Lin
Shaoxiong Zhan
Junyou Su
Haitao Zheng
Hui Wang
RALM
206
3
0
24 Sep 2024
Promptriever: Instruction-Trained Retrievers Can Be Prompted Like
  Language Models
Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language ModelsInternational Conference on Learning Representations (ICLR), 2024
Orion Weller
Benjamin Van Durme
Dawn J Lawrie
Ashwin Paranjape
Yuhao Zhang
Jack Hessel
LRMRALM
202
41
0
17 Sep 2024
Enhancing Q&A Text Retrieval with Ranking Models: Benchmarking,
  fine-tuning and deploying Rerankers for RAG
Enhancing Q&A Text Retrieval with Ranking Models: Benchmarking, fine-tuning and deploying Rerankers for RAG
Gabriel de Souza P. Moreira
Ronay Ak
Benedikt Schifferer
Mengyao Xu
Radek Osmulski
Even Oldridge
189
14
0
12 Sep 2024
The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model design
The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model designNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Artem Snegirev
Maria Tikhonova
Anna Maksimova
Alena Fenogenova
Alexander Abramov
438
18
0
22 Aug 2024
NV-Retriever: Improving text embedding models with effective hard-negative mining
NV-Retriever: Improving text embedding models with effective hard-negative mining
Gabriel de Souza P. Moreira
Radek Osmulski
Mengyao Xu
Ronay Ak
Benedikt Schifferer
Even Oldridge
RALM
343
74
0
22 Jul 2024
The 2024 Foundation Model Transparency Index
The 2024 Foundation Model Transparency Index
Rishi Bommasani
Kevin Klyman
Sayash Kapoor
Shayne Longpre
Betty Xiong
Nestor Maslej
Abigail Z. Jacobs
ELM
319
5
0
17 Jul 2024
The FineWeb Datasets: Decanting the Web for the Finest Text Data at
  Scale
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
Guilherme Penedo
Hynek Kydlícek
Loubna Ben Allal
Anton Lozhkov
Margaret Mitchell
Colin Raffel
Leandro von Werra
Thomas Wolf
378
551
0
25 Jun 2024
Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024
  Retrieval-Augmented Generation Track
Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track
Ronak Pradeep
Nandan Thakur
Sahel Sharifymoghaddam
Eric Zhang
Ryan Nguyen
Daniel Campos
Nick Craswell
Jimmy Lin
286
31
0
24 Jun 2024
Can't Hide Behind the API: Stealing Black-Box Commercial Embedding Models
Can't Hide Behind the API: Stealing Black-Box Commercial Embedding Models
Manveer Singh Tamber
Jasper Xian
Jimmy Lin
MLAUSILM
667
5
0
13 Jun 2024
1