Nomic Embed: Training a Reproducible Long Context Text Embedder

2 February 2024

Papers citing "Nomic Embed: Training a Reproducible Long Context Text Embedder"

21 / 21 papers shown

Title
Leveraging Vision-Language Models for Visual Grounding and Analysis of Automotive UI Benjamin Raphael Ernhofer Daniil Prokhorov Jannica Langner Dominik Bollmann 25 0 0 09 May 2025
R&B: Domain Regrouping and Data Mixture Balancing for Efficient Foundation Model Training Albert Ge Tzu-Heng Huang John Cooper Avi Trost Ziyi Chu Satya Sai Srinath Namburi GNVV Ziyang Cai Kendall Park Nicholas Roberts Frederic Sala 53 0 0 01 May 2025
SemEval-2025 Task 5: LLMs4Subjects -- LLM-based Automated Subject Tagging for a National Technical Library's Open-Access Catalog Jennifer D’Souza Sameer Sadruddin Holger Israel Mathias Begoin Diana Slawig 52 5 0 09 Apr 2025
Can LLM-Driven Hard Negative Sampling Empower Collaborative Filtering? Findings and Potentials Chu Zhao Enneng Yang Yuting Liu Jianzhe Zhao G. Guo Xingwei Wang 28 0 0 07 Apr 2025
Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation Tiansheng Wen Yifei Wang Zequn Zeng Zhong Peng Yudi Su Xinyang Liu Bo Chen Hongwei Liu Stefanie Jegelka Chenyu You CLL 56 2 0 03 Mar 2025
Enhancing Domain-Specific Retrieval-Augmented Generation: Synthetic Data Generation and Evaluation using Reasoning Models Aryan Jadon Avinash Patil Shashank Kumar SyDa 39 1 0 21 Feb 2025
Mind the Gap! Choice Independence in Using Multilingual LLMs for Persuasive Co-Writing Tasks in Different Languages Shreyan Biswas Alexander Erlei U. Gadiraju 103 4 0 13 Feb 2025
Consistent estimation of generative model representations in the data kernel perspective space Aranyak Acharyya M. Trosset Carey E. Priebe Hayden Helm DiffM 54 3 0 20 Jan 2025
ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance & Efficiency on a Specific Domain Ali Shiraee Kasmaee Mohammad Khodadad Mohammad Arshi Saloot Nick Sherck Stephen Dokas H. Mahyar Soheila Samiee ELM 97 0 0 30 Nov 2024
Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval Pengcheng Jiang Cao Xiao Minhao Jiang Parminder Bhatia Taha A. Kass-Hout Jimeng Sun Jiawei Han RALM AI4MH 43 4 0 06 Oct 2024
Better Instruction-Following Through Minimum Bayes Risk Ian Wu Patrick Fernandes Amanda Bertsch Seungone Kim Sina Pakazad Graham Neubig 48 9 0 03 Oct 2024
Understanding Generative AI Content with Embedding Models Max Vargas Reilly Cannon A. Engel Anand D. Sarwate Tony Chiang 42 3 0 19 Aug 2024
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities Peng-Tao Xu Wei Ping Xianchao Wu Zihan Liu M. Shoeybi Mohammad Shoeybi Bryan Catanzaro RALM 44 14 0 19 Jul 2024
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs Ziyan Jiang Xueguang Ma Wenhu Chen RALM 41 47 0 21 Jun 2024
Can't Hide Behind the API: Stealing Black-Box Commercial Embedding Models Manveer Singh Tamber Jasper Xian Jimmy Lin MLAU SILM 78 0 0 13 Jun 2024
High-Dimension Human Value Representation in Large Language Models Samuel Cahyawijaya Delong Chen Yejin Bang Leila Khalatbari Bryan Wilie Ziwei Ji Etsuko Ishii Pascale Fung 63 5 0 11 Apr 2024
Binary Classifier Optimization for Large Language Model Alignment Seungjae Jung Gunsoo Han D. W. Nam Kyoung-Woon On 29 20 0 06 Apr 2024
The First Place Solution of WSDM Cup 2024: Leveraging Large Language Models for Conversational Multi-Doc QA Yiming Li Zhao Zhang 21 1 0 28 Feb 2024
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models Nandan Thakur Nils Reimers Andreas Rucklé Abhishek Srivastava Iryna Gurevych VLM 229 961 0 17 Apr 2021
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism M. Shoeybi M. Patwary Raul Puri P. LeGresley Jared Casper Bryan Catanzaro MoE 243 1,791 0 17 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 294 6,927 0 20 Apr 2018