Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2212.05221
Cited By
REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
10 December 2022
Ziniu Hu
Ahmet Iscen
Chen Sun
Zirui Wang
Kai-Wei Chang
Yizhou Sun
Cordelia Schmid
David A. Ross
Alireza Fathi
RALM
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory"
21 / 21 papers shown
Title
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Mohammad Mahdi Abootorabi
Amirhosein Zobeiri
Mahdi Dehghani
Mohammadali Mohammadkhani
Bardia Mohammadi
Omid Ghahroodi
M. Baghshah
Ehsaneddin Asgari
RALM
98
4
0
12 Feb 2025
Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent
Yangning Li
Yinghui Li
Xinyu Wang
Yong-feng Jiang
Zhen Zhang
...
Hui Wang
Hai-Tao Zheng
Pengjun Xie
Philip S. Yu
Fei Huang
60
15
0
05 Nov 2024
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
S. Yu
C. Tang
Bokai Xu
Junbo Cui
Junhao Ran
...
Zhenghao Liu
Shuo Wang
Xu Han
Zhiyuan Liu
Maosong Sun
VLM
37
22
0
14 Oct 2024
Fact-Aware Multimodal Retrieval Augmentation for Accurate Medical Radiology Report Generation
Liwen Sun
James Zhao
Megan Han
Chenyan Xiong
MedIm
45
7
0
21 Jul 2024
Reminding Multimodal Large Language Models of Object-aware Knowledge with Retrieved Tags
Daiqing Qi
Handong Zhao
Zijun Wei
Sheng Li
35
2
0
16 Jun 2024
A Textbook Remedy for Domain Shifts: Knowledge Priors for Medical Image Analysis
Yue Yang
Mona Gandhi
Yufei Wang
Yifan Wu
Michael S. Yao
Christopher Callison-Burch
James C. Gee
Mark Yatskar
38
3
0
23 May 2024
Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions
Junzhang Liu
Zhecan Wang
Hammad A. Ayyubi
Haoxuan You
Chris Thomas
Rui Sun
Shih-Fu Chang
Kai-Wei Chang
29
0
0
18 May 2024
A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models
Wenqi Fan
Yujuan Ding
Liang-bo Ning
Shijie Wang
Hengyun Li
Dawei Yin
Tat-Seng Chua
Qing Li
RALM
3DV
38
178
0
10 May 2024
PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based Visual Question Answering
Yihao Ding
Kaixuan Ren
Jiabin Huang
Siwen Luo
S. Han
35
1
0
19 Apr 2024
UniIR: Training and Benchmarking Universal Multimodal Information Retrievers
Cong Wei
Yang Chen
Haonan Chen
Hexiang Hu
Ge Zhang
Jie Fu
Alan Ritter
Wenhu Chen
28
50
0
28 Nov 2023
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
Pan Zhang
Xiaoyi Wang
Bin Wang
Yuhang Cao
Chao Xu
...
Conghui He
Xingcheng Zhang
Yu Qiao
Da Lin
Jiaqi Wang
MLLM
61
222
0
26 Sep 2023
Cross-Modal Retrieval Meets Inference:Improving Zero-Shot Classification with Cross-Modal Retrieval
Seong-Hoon Eom
Namgyu Ho
Jaehoon Oh
Se-Young Yun
CLIP
VLM
23
0
0
29 Aug 2023
Linear Alignment of Vision-language Models for Image Captioning
Fabian Paischer
M. Hofmarcher
Sepp Hochreiter
Thomas Adler
CLIP
VLM
40
0
0
10 Jul 2023
ViperGPT: Visual Inference via Python Execution for Reasoning
Dídac Surís
Sachit Menon
Carl Vondrick
MLLM
LRM
ReLM
40
428
0
14 Mar 2023
TPU-KNN: K Nearest Neighbor Search at Peak FLOP/s
Felix Chern
Blake A. Hechtman
Andy Davis
Ruiqi Guo
David Majnemer
Surinder Kumar
94
22
0
28 Jun 2022
Improving Multi-Task Generalization via Regularizing Spurious Correlation
Ziniu Hu
Zhe Zhao
Xinyang Yi
Tiansheng Yao
Lichan Hong
Yizhou Sun
Ed H. Chi
OOD
LRM
83
29
0
19 May 2022
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
Zhengyuan Yang
Zhe Gan
Jianfeng Wang
Xiaowei Hu
Yumao Lu
Zicheng Liu
Lijuan Wang
169
402
0
10 Sep 2021
WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning
Krishna Srinivasan
K. Raman
Jiecao Chen
Michael Bendersky
Marc Najork
VLM
197
308
0
02 Mar 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
273
1,077
0
17 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
3,683
0
11 Feb 2021
Distilling Knowledge from Reader to Retriever for Question Answering
Gautier Izacard
Edouard Grave
RALM
180
251
0
08 Dec 2020
1