ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.17136
  4. Cited By
UniIR: Training and Benchmarking Universal Multimodal Information
  Retrievers

UniIR: Training and Benchmarking Universal Multimodal Information Retrievers

28 November 2023
Cong Wei
Yang Chen
Haonan Chen
Hexiang Hu
Ge Zhang
Jie Fu
Alan Ritter
Wenhu Chen
ArXivPDFHTML

Papers citing "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers"

43 / 43 papers shown
Title
Fill the Gap: Quantifying and Reducing the Modality Gap in Image-Text Representation Learning
Fill the Gap: Quantifying and Reducing the Modality Gap in Image-Text Representation Learning
François Role
Sébastien Meyer
Victor Amblard
VLM
33
0
0
06 May 2025
UniversalRAG: Retrieval-Augmented Generation over Multiple Corpora with Diverse Modalities and Granularities
UniversalRAG: Retrieval-Augmented Generation over Multiple Corpora with Diverse Modalities and Granularities
Woongyeong Yeo
Kangsan Kim
Soyeong Jeong
Jinheon Baek
S. Hwang
44
0
0
29 Apr 2025
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs
Tiancheng Gu
Kaicheng Yang
Ziyong Feng
Xingjun Wang
Yanzhao Zhang
Dingkun Long
Yingda Chen
Weidong Cai
Jiankang Deng
VLM
73
0
0
24 Apr 2025
MIEB: Massive Image Embedding Benchmark
MIEB: Massive Image Embedding Benchmark
Chenghao Xiao
Isaac Chung
Imene Kerboua
Jamie Stirling
Xin Zhang
Márton Kardos
Roman Solomatin
Noura Al Moubayed
K. Enevoldsen
Niklas Muennighoff
VLM
35
0
0
14 Apr 2025
IDMR: Towards Instance-Driven Precise Visual Correspondence in Multimodal Retrieval
IDMR: Towards Instance-Driven Precise Visual Correspondence in Multimodal Retrieval
Bangwei Liu
Yicheng Bao
Shaohui Lin
Xuhong Wang
Xin Tan
Y. Wang
Yuan Xie
Chaochao Lu
53
0
0
01 Apr 2025
Poisoned-MRAG: Knowledge Poisoning Attacks to Multimodal Retrieval Augmented Generation
Yinuo Liu
Zenghui Yuan
Guiyao Tie
Jiawen Shi
Lichao Sun
Lichao Sun
Neil Zhenqiang Gong
36
1
0
08 Mar 2025
Qilin: A Multimodal Information Retrieval Dataset with APP-level User Sessions
Jia Chen
Qian Dong
Haitao Li
Xiaohui He
Yan Gao
...
Ping Yang
Chen Xu
Yao Hu
Qingyao Ai
Y. Liu
32
0
0
01 Mar 2025
ABC: Achieving Better Control of Multimodal Embeddings using VLMs
Benjamin Schneider
Florian Kerschbaum
Wenhu Chen
41
0
0
01 Mar 2025
SuperRAG: Beyond RAG with Layout-Aware Graph Modeling
Jeff Yang
Duy-Khanh Vu
Minh-Tien Nguyen
Xuan-Quang Nguyen
Linh Nguyen
H. Le
3DV
63
0
0
28 Feb 2025
Joint Fusion and Encoding: Advancing Multimodal Retrieval from the Ground Up
Joint Fusion and Encoding: Advancing Multimodal Retrieval from the Ground Up
Lang Huang
Qiyu Wu
Zhongtao Miao
T. Yamasaki
50
0
0
27 Feb 2025
ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval
ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval
Guanqi Zhan
Yuanpei Liu
Kai Han
Weidi Xie
Andrew Zisserman
VLM
60
0
0
21 Feb 2025
A Survey of Model Architectures in Information Retrieval
A Survey of Model Architectures in Information Retrieval
Zhichao Xu
Fengran Mo
Zhiqi Huang
Crystina Zhang
Puxuan Yu
Bei Wang
Jimmy J. Lin
Vivek Srikumar
KELM
3DV
46
2
0
21 Feb 2025
Any Information Is Just Worth One Single Screenshot: Unifying Search With Visualized Information Retrieval
Any Information Is Just Worth One Single Screenshot: Unifying Search With Visualized Information Retrieval
Ze Liu
Zhengyang Liang
Junjie Zhou
Zheng Liu
Defu Lian
OffRL
51
0
0
17 Feb 2025
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Mohammad Mahdi Abootorabi
Amirhosein Zobeiri
Mahdi Dehghani
Mohammadali Mohammadkhani
Bardia Mohammadi
Omid Ghahroodi
M. Baghshah
Ehsaneddin Asgari
RALM
87
3
0
12 Feb 2025
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
Ziyan Jiang
Rui Meng
Xinyi Yang
Semih Yavuz
Yingbo Zhou
Wenhu Chen
MLLM
VLM
45
18
0
03 Jan 2025
GME: Improving Universal Multimodal Retrieval by Multimodal LLMs
GME: Improving Universal Multimodal Retrieval by Multimodal LLMs
Xin Zhang
Yanzhao Zhang
Wen Xie
Mingxin Li
Ziqi Dai
Dingkun Long
Pengjun Xie
Meishan Zhang
Wenjie Li
M. Zhang
102
7
0
22 Dec 2024
CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model
CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model
Dongyoung Go
Taesun Whang
Chanhee Lee
Hwayeon Kim
Sunghoon Park
Seunghwan Ji
Dongchan Kim
Young-Bum Kim
Young-Bum Kim
LRM
81
1
0
19 Nov 2024
MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs
MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs
Sheng-Chieh Lin
Chankyu Lee
M. Shoeybi
Jimmy J. Lin
Bryan Catanzaro
Wei Ping
53
10
0
04 Nov 2024
UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers
UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers
Dehai Min
Zhiyang Xu
Guilin Qi
Lifu Huang
Chenyu You
RALM
56
1
0
26 Oct 2024
Visual Text Matters: Improving Text-KVQA with Visual Text Entity
  Knowledge-aware Large Multimodal Assistant
Visual Text Matters: Improving Text-KVQA with Visual Text Entity Knowledge-aware Large Multimodal Assistant
A. S. Penamakuri
Anand Mishra
14
0
0
24 Oct 2024
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
S. Yu
C. Tang
Bokai Xu
Junbo Cui
Junhao Ran
...
Zhenghao Liu
Shuo Wang
Xu Han
Zhiyuan Liu
Maosong Sun
VLM
29
21
0
14 Oct 2024
MLLM as Retriever: Interactively Learning Multimodal Retrieval for
  Embodied Agents
MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents
Junpeng Yue
Xinru Xu
Börje F. Karlsson
Zongqing Lu
32
0
0
04 Oct 2024
PhishAgent: A Robust Multimodal Agent for Phishing Webpage Detection
PhishAgent: A Robust Multimodal Agent for Phishing Webpage Detection
Tri Cao
Chengyu Huang
Yuexin Li
Huilin Wang
Amy He
Nay Oo
Bryan Hooi
LLMAG
OffRL
59
4
0
20 Aug 2024
E5-V: Universal Embeddings with Multimodal Large Language Models
E5-V: Universal Embeddings with Multimodal Large Language Models
Ting Jiang
Minghui Song
Zihan Zhang
Haizhen Huang
Weiwei Deng
Feng Sun
Qi Zhang
Deqing Wang
Fuzhen Zhuang
VLM
21
19
0
17 Jul 2024
SK-VQA: Synthetic Knowledge Generation at Scale for Training
  Context-Augmented Multimodal LLMs
SK-VQA: Synthetic Knowledge Generation at Scale for Training Context-Augmented Multimodal LLMs
Xin Su
Man Luo
Kris W Pan
Tien Pei Chou
Vasudev Lal
Phillip Howard
27
3
0
28 Jun 2024
MATE: Meet At The Embedding -- Connecting Images with Long Texts
MATE: Meet At The Embedding -- Connecting Images with Long Texts
Young Kyun Jang
Junmo Kang
Yong Jae Lee
Donghyun Kim
VLM
18
5
0
26 Jun 2024
VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval
VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval
Junjie Zhou
Zheng Liu
Shitao Xiao
Bo Zhao
Yongping Xiong
31
20
0
06 Jun 2024
Beyond Relevance: Evaluate and Improve Retrievers on Perspective
  Awareness
Beyond Relevance: Evaluate and Improve Retrievers on Perspective Awareness
Xinran Zhao
Tong Chen
Sihao Chen
Hongming Zhang
Tongshuang Wu
21
7
0
04 May 2024
Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal
  LLMs
Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs
Davide Caffagni
Federico Cocchi
Nicholas Moratelli
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
KELM
24
34
0
23 Apr 2024
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
Kai Zhang
Yi Luan
Hexiang Hu
Kenton Lee
Siyuan Qiao
Wenhu Chen
Yu-Chuan Su
Ming-Wei Chang
VLM
LRM
23
32
0
28 Mar 2024
INSTRUCTIR: A Benchmark for Instruction Following of Information
  Retrieval Models
INSTRUCTIR: A Benchmark for Instruction Following of Information Retrieval Models
Hanseok Oh
Hyunji Lee
Seonghyeon Ye
Haebin Shin
Hansol Jang
Changwook Jun
Minjoon Seo
20
19
0
22 Feb 2024
PreFLMR: Scaling Up Fine-Grained Late-Interaction Multi-modal Retrievers
PreFLMR: Scaling Up Fine-Grained Late-Interaction Multi-modal Retrievers
Weizhe Lin
Jingbiao Mei
Jinghong Chen
Bill Byrne
VLM
AI4Ed
63
4
0
13 Feb 2024
SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval
SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval
Siwei Wu
Yizhi Li
Kang Zhu
Ge Zhang
Yiming Liang
...
Wenhu Chen
Wenhao Huang
Noura Al Moubayed
Jie Fu
Chenghua Lin
14
11
0
24 Jan 2024
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image
  Editing
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing
Kai Zhang
Lingbo Mo
Wenhu Chen
Huan Sun
Yu-Chuan Su
EGVM
99
235
0
16 Jun 2023
EDIS: Entity-Driven Image Search over Multimodal Web Content
EDIS: Entity-Driven Image Search over Multimodal Web Content
Siqi Liu
Weixi Feng
Tsu-jui Fu
Wenhu Chen
W. Wang
VLM
28
9
0
23 May 2023
Open-domain Visual Entity Recognition: Towards Recognizing Millions of
  Wikipedia Entities
Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities
Hexiang Hu
Yi Luan
Yang Chen
Urvashi Khandelwal
Mandar Joshi
Kenton Lee
Kristina Toutanova
Ming-Wei Chang
VLM
40
54
0
22 Feb 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Re-Imagen: Retrieval-Augmented Text-to-Image Generator
Re-Imagen: Retrieval-Augmented Text-to-Image Generator
Wenhu Chen
Hexiang Hu
Chitwan Saharia
William W. Cohen
VLM
114
159
0
29 Sep 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
380
4,010
0
28 Jan 2022
MURAL: Multimodal, Multitask Retrieval Across Languages
MURAL: Multimodal, Multitask Retrieval Across Languages
Aashi Jain
Mandy Guo
Krishna Srinivasan
Ting-Li Chen
Sneha Kudugunta
Chao Jia
Yinfei Yang
Jason Baldridge
VLM
109
52
0
10 Sep 2021
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information
  Retrieval Models
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models
Nandan Thakur
Nils Reimers
Andreas Rucklé
Abhishek Srivastava
Iryna Gurevych
VLM
229
720
0
17 Apr 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
2,875
0
11 Feb 2021
1