Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.03216
Cited By
BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation
5 February 2024
Jianlv Chen
Shitao Xiao
Peitian Zhang
Kun Luo
Defu Lian
Zheng Liu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation"
47 / 47 papers shown
Title
Tevatron 2.0: Unified Document Retrieval Toolkit across Scale, Language, and Modality
Xueguang Ma
Luyu Gao
Shengyao Zhuang
Jiaqi Samantha Zhan
Jamie Callan
Jimmy Lin
28
0
0
05 May 2025
Scalable Unit Harmonization in Medical Informatics Using Bi-directional Transformers and Bayesian-Optimized BM25 and Sentence Embedding Retrieval
Jordi de la Torre
32
0
0
01 May 2025
DNB-AI-Project at SemEval-2025 Task 5: An LLM-Ensemble Approach for Automated Subject Indexing
Lisa Kluge
Maximilian Kähler
67
1
0
30 Apr 2025
Are Information Retrieval Approaches Good at Harmonising Longitudinal Survey Questions in Social Science?
Wing Yan Li
Zeqiang Wang
Jon Johnson
Suparna De
32
0
0
29 Apr 2025
Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception
Yuanchen Wu
Lu Zhang
Hang Yao
Junlong Du
Ke Yan
Shouhong Ding
Yunsheng Wu
X. Li
MLLM
66
0
0
29 Apr 2025
Can LLMs Detect Intrinsic Hallucinations in Paraphrasing and Machine Translation?
Evangelia Gogoulou
Shorouq Zahra
Liane Guillou
Luise Dürlich
Joakim Nivre
HILM
LRM
44
0
0
29 Apr 2025
Reconstructing Context: Evaluating Advanced Chunking Strategies for Retrieval-Augmented Generation
Carlo Merola
Jaspinder Singh
RALM
41
0
0
28 Apr 2025
Privacy-Preserving Federated Embedding Learning for Localized Retrieval-Augmented Generation
Qianren Mao
Qili Zhang
Hanwen Hao
Zhentao Han
Runhua Xu
...
Bo Li
Y. Song
Jin Dong
Jianxin Li
Philip S. Yu
61
0
0
27 Apr 2025
DualRAG: A Dual-Process Approach to Integrate Reasoning and Retrieval for Multi-Hop Question Answering
Rong Cheng
J. Liu
Yan Zheng
Fei Ni
Jiazhen Du
Hangyu Mao
Fuzheng Zhang
Bo-Lan Wang
Jianye Hao
LRM
48
0
0
25 Apr 2025
Secure Multifaceted-RAG for Enterprise: Hybrid Knowledge Retrieval with Security Filtering
Grace Byun
S. Lee
Nayoung Choi
Jinho D. Choi
25
0
0
18 Apr 2025
SemEval-2025 Task 5: LLMs4Subjects -- LLM-based Automated Subject Tagging for a National Technical Library's Open-Access Catalog
Jennifer D’Souza
Sameer Sadruddin
Holger Israel
Mathias Begoin
Diana Slawig
40
5
0
09 Apr 2025
LRAGE: Legal Retrieval Augmented Generation Evaluation Tool
Minhu Park
Hongseok Oh
Eunkyung Choi
Wonseok Hwang
AILaw
RALM
ELM
107
0
0
02 Apr 2025
CrossFormer: Cross-Segment Semantic Fusion for Document Segmentation
Tongke Ni
Yang Fan
Junru Zhou
Xiangping Wu
Qingcai Chen
27
0
0
31 Mar 2025
EuroBERT: Scaling Multilingual Encoders for European Languages
Nicolas Boizard
Hippolyte Gisserot-Boukhlef
Duarte M. Alves
André F. T. Martins
Ayoub Hammal
...
Maxime Peyrard
Nuno M. Guerreiro
Patrick Fernandes
Ricardo Rei
Pierre Colombo
31
0
0
07 Mar 2025
A Practical Memory Injection Attack against LLM Agents
Shen Dong
Shaocheng Xu
Pengfei He
Y. Li
Jiliang Tang
Tianming Liu
Hui Liu
Zhen Xiang
LLMAG
AAML
38
2
0
05 Mar 2025
Project Alexandria: Towards Freeing Scientific Knowledge from Copyright Burdens via LLMs
Christoph Schuhmann
Gollam Rabby
Ameya Prabhu
Tawsif Ahmed
Andreas Hochlehnert
...
Ludwig Schmidt
R. Kaczmarczyk
Sören Auer
J. Jitsev
Matthias Bethge
75
0
0
26 Feb 2025
Trustworthy Answers, Messier Data: Bridging the Gap in Low-Resource Retrieval-Augmented Generation for Domain Expert Systems
Nayoung Choi
Grace Byun
Andrew Chung
Ellie S. Paek
S. Lee
Jinho D. Choi
RALM
71
1
0
26 Feb 2025
REAL-MM-RAG: A Real-World Multi-Modal Retrieval Benchmark
Navve Wasserman
Roi Pony
O. Naparstek
Adi Raz Goldfarb
Eli Schwartz
Udi Barzelay
Leonid Karlinsky
3DV
VLM
60
1
0
17 Feb 2025
Following the Autoregressive Nature of LLM Embeddings via Compression and Alignment
Jingcheng Deng
Zhongtao Jiang
Liang Pang
Liwei Chen
Kun Xu
Zihao Wei
Huawei Shen
Xueqi Cheng
39
1
0
17 Feb 2025
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Mohammad Mahdi Abootorabi
Amirhosein Zobeiri
Mahdi Dehghani
Mohammadali Mohammadkhani
Bardia Mohammadi
Omid Ghahroodi
M. Baghshah
Ehsaneddin Asgari
RALM
65
3
0
12 Feb 2025
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Jinyang Wu
Mingkuan Feng
Shuai Zhang
Ruihan Jin
Feihu Che
Zengqi Wen
J. Tao
LRM
36
7
0
04 Feb 2025
Can we Retrieve Everything All at Once? ARM: An Alignment-Oriented LLM-based Retrieval Method
Peter Baile Chen
Yi Zhang
Michael Cafarella
Dan Roth
RALM
AIFin
107
2
0
30 Jan 2025
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Kimi Team
Angang Du
Bofei Gao
Bowei Xing
Changjiu Jiang
...
Zhilin Yang
Zhiqi Huang
Zihao Huang
Ziyao Xu
Z. Yang
VLM
ALM
OffRL
AI4TS
LRM
78
128
0
22 Jan 2025
ALoFTRAG: Automatic Local Fine Tuning for Retrieval Augmented Generation
Peter Devine
22
0
0
21 Jan 2025
Knowledge Retrieval Based on Generative AI
Te-Lun Yang
Jyi-Shane Liu
Yuen-Hsien Tseng
Jyh-Shing Roger Jang
3DV
RALM
38
1
0
17 Jan 2025
LLMs Model Non-WEIRD Populations: Experiments with Synthetic Cultural Agents
Augusto Gonzalez-Bonorino
Monica Capra
Emilio Pantoja
31
1
0
12 Jan 2025
LUSIFER: Language Universal Space Integration for Enhanced Multilingual Embeddings with Large Language Models
Hieu Man
Nghia Trung Ngo
Viet Dac Lai
Ryan Rossi
Franck Dernoncourt
T. Nguyen
41
0
0
01 Jan 2025
jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images
Andreas Koukounas
Georgios Mastrapas
Bo Wang
Mohammad Kalim Akram
Sedigheh Eslami
Michael Gunther
Isabelle Mohr
Saba Sturua
Scott Martens
Nan Wang
VLM
78
6
0
11 Dec 2024
MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems
Nandan Thakur
Suleman Kazi
Ge Luo
Jimmy J. Lin
Amin Ahmad
VLM
RALM
21
6
0
17 Oct 2024
Revisiting Benchmark and Assessment: An Agent-based Exploratory Dynamic Evaluation Framework for LLMs
Wanying Wang
Zeyu Ma
Pengfei Liu
Mingang Chen
LLMAG
35
1
0
15 Oct 2024
FunnelRAG: A Coarse-to-Fine Progressive Retrieval Paradigm for RAG
X. Zhao
Yan Zhong
Zetian Sun
Xinshuo Hu
Zhenyu Liu
Dongfang Li
Baotian Hu
Min Zhang
16
6
0
14 Oct 2024
Beyond Exact Match: Semantically Reassessing Event Extraction by Large Language Models
Yi-Fan Lu
Xian-Ling Mao
Tian Lan
Heyan Huang
Heyan Huang
Xiaoyan Gao
33
0
0
12 Oct 2024
Wrong-of-Thought: An Integrated Reasoning Framework with Multi-Perspective Verification and Wrong Information
Yongheng Zhang
Qiguang Chen
Jingxuan Zhou
Peng Wang
Jiasheng Si
Jin Wang
Wenpeng Lu
Libo Qin
LRM
26
3
0
06 Oct 2024
FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows"
Yifei Ming
Senthil Purushwalkam
Shrey Pandit
Zixuan Ke
Xuan-Phi Nguyen
Caiming Xiong
Shafiq R. Joty
HILM
102
16
0
30 Sep 2024
RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework
Kunlun Zhu
Yifan Luo
Dingling Xu
Ruobing Wang
Shi Yu
...
Yishan Li
Zhiyuan Liu
Xu Han
Zhiyuan Liu
Maosong Sun
19
17
0
02 Aug 2024
INDIC QA BENCHMARK: A Multilingual Benchmark to Evaluate Question Answering capability of LLMs for Indic Languages
A. Singh
Rudra Murthy
Vishwajeet Kumar
Jaydeep Sen
Ashish Mittal
Ganesh Ramakrishnan
20
6
0
18 Jul 2024
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Kaichen Zhang
Bo Li
Peiyuan Zhang
Fanyi Pu
Joshua Adrian Cahyono
...
Shuai Liu
Yuanhan Zhang
Jingkang Yang
Chunyuan Li
Ziwei Liu
79
73
0
17 Jul 2024
CoIR: A Comprehensive Benchmark for Code Information Retrieval Models
Xiangyang Li
Kuicai Dong
Yi Quan Lee
Wei Xia
Yichun Yin
Xinyi Dai
Yasheng Wang
Ruiming Tang
22
13
0
03 Jul 2024
ColPali: Efficient Document Retrieval with Vision Language Models
Manuel Faysse
Hugues Sibille
Tony Wu
Bilel Omrani
Gautier Viaud
C´eline Hudelot
Pierre Colombo
VLM
36
21
0
27 Jun 2024
RetroMAE: Pre-Training Retrieval-oriented Language Models Via Masked Auto-Encoder
Shitao Xiao
Zheng Liu
Yingxia Shao
Zhao Cao
RALM
112
105
0
24 May 2022
Text and Code Embeddings by Contrastive Pre-Training
Arvind Neelakantan
Tao Xu
Raul Puri
Alec Radford
Jesse Michael Han
...
Tabarak Khan
Toki Sherbakov
Joanne Jang
Peter Welinder
Lilian Weng
SSL
AI4TS
188
412
0
24 Jan 2022
RocketQAv2: A Joint Training Method for Dense Passage Retrieval and Passage Re-ranking
Ruiyang Ren
Yingqi Qu
Jing Liu
Wayne Xin Zhao
Qiaoqiao She
Hua-Hong Wu
Haifeng Wang
Ji-Rong Wen
121
244
0
14 Oct 2021
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models
Nandan Thakur
Nils Reimers
Andreas Rucklé
Abhishek Srivastava
Iryna Gurevych
VLM
229
720
0
17 Apr 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
231
1,508
0
31 Dec 2020
RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering
Yingqi Qu
Yuchen Ding
Jing Liu
Kai Liu
Ruiyang Ren
Xin Zhao
Daxiang Dong
Hua-Hong Wu
Haifeng Wang
RALM
OffRL
192
516
0
16 Oct 2020
Pretrained Transformers for Text Ranking: BERT and Beyond
Jimmy J. Lin
Rodrigo Nogueira
Andrew Yates
VLM
208
515
0
13 Oct 2020
PubMedQA: A Dataset for Biomedical Research Question Answering
Qiao Jin
Bhuwan Dhingra
Zhengping Liu
William W. Cohen
Xinghua Lu
193
791
0
13 Sep 2019
1