Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2302.11713
Cited By
Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions?
23 February 2023
Yang Chen
Hexiang Hu
Yi Luan
Haitian Sun
Soravit Changpinyo
Alan Ritter
Ming-Wei Chang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions?"
50 / 60 papers shown
Title
Summarization of Multimodal Presentations with Vision-Language Models: Study of the Effect of Modalities and Structure
Théo Gigant
Camille Guinaudeau
Frédéric Dufaux
24
0
0
14 Apr 2025
MIEB: Massive Image Embedding Benchmark
Chenghao Xiao
Isaac Chung
Imene Kerboua
Jamie Stirling
Xin Zhang
Márton Kardos
Roman Solomatin
Noura Al Moubayed
K. Enevoldsen
Niklas Muennighoff
VLM
35
0
0
14 Apr 2025
Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook
Xu Zheng
Ziqiao Weng
Yuanhuiyi Lyu
Lutao Jiang
Haiwei Xue
Bin Ren
Danda Pani Paudel
N. Sebe
Luc Van Gool
Xuming Hu
3DV
37
1
0
23 Mar 2025
Poisoned-MRAG: Knowledge Poisoning Attacks to Multimodal Retrieval Augmented Generation
Yinuo Liu
Zenghui Yuan
Guiyao Tie
Jiawen Shi
Lichao Sun
Lichao Sun
Neil Zhenqiang Gong
36
1
0
08 Mar 2025
Fine-Grained Retrieval-Augmented Generation for Visual Question Answering
Zhengxuan Zhang
Yin Wu
Yuyu Luo
Nan Tang
33
0
0
28 Feb 2025
Joint Fusion and Encoding: Advancing Multimodal Retrieval from the Ground Up
Lang Huang
Qiyu Wu
Zhongtao Miao
T. Yamasaki
53
0
0
27 Feb 2025
Detecting Knowledge Boundary of Vision Large Language Models by Sampling-Based Inference
Zhuo Chen
Xinyu Wang
Yong-feng Jiang
Zhen Zhang
Xinyu Geng
Pengjun Xie
Fei Huang
Kewei Tu
98
0
0
25 Feb 2025
Visual-RAG: Benchmarking Text-to-Image Retrieval Augmented Generation for Visual Knowledge Intensive Queries
Yin Wu
Quanyu Long
Jing Li
Jianfei Yu
Wenya Wang
VLM
39
1
0
23 Feb 2025
Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines
Xinwei Long
Zhiyuan Ma
Ermo Hua
Kaiyan Zhang
Biqing Qi
Bowen Zhou
RALM
46
0
0
23 Feb 2025
LOVA3: Learning to Visual Question Answering, Asking and Assessment
Henry Hengyuan Zhao
Pan Zhou
Difei Gao
Zechen Bai
Mike Zheng Shou
71
8
0
21 Feb 2025
Generative Landmarks Guided Eyeglasses Removal 3D Face Reconstruction
Dapeng Zhao
Yue Qi
3DH
CVBM
3DV
27
0
0
31 Dec 2024
GME: Improving Universal Multimodal Retrieval by Multimodal LLMs
Xin Zhang
Yanzhao Zhang
Wen Xie
Mingxin Li
Ziqi Dai
Dingkun Long
Pengjun Xie
Meishan Zhang
Wenjie Li
M. Zhang
111
7
0
22 Dec 2024
Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent
Yangning Li
Yinghui Li
Xinyu Wang
Yong-feng Jiang
Zhen Zhang
...
Hui Wang
Hai-Tao Zheng
Pengjun Xie
Philip S. Yu
Fei Huang
60
15
0
05 Nov 2024
MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs
Sheng-Chieh Lin
Chankyu Lee
M. Shoeybi
Jimmy J. Lin
Bryan Catanzaro
Wei Ping
57
10
0
04 Nov 2024
Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach
Mathilde Caron
Alireza Fathi
Cordelia Schmid
Ahmet Iscen
28
1
0
31 Oct 2024
Visual Text Matters: Improving Text-KVQA with Visual Text Entity Knowledge-aware Large Multimodal Assistant
A. S. Penamakuri
Anand Mishra
19
1
0
24 Oct 2024
RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models
Haoran Hao
Jiaming Han
Changsheng Li
Yu-Feng Li
Xiangyu Yue
RALM
43
1
0
17 Oct 2024
Understanding the Role of LLMs in Multimodal Evaluation Benchmarks
Botian Jiang
Lei Li
Xiaonan Li
Zhaowei Li
Xiachong Feng
Lingpeng Kong
Q. Liu
Xipeng Qiu
41
2
0
16 Oct 2024
MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models
Wenbo Hu
Jia-Chen Gu
Zi-Yi Dou
Mohsen Fayyaz
Pan Lu
Kai-Wei Chang
Nanyun Peng
VLM
58
4
0
10 Oct 2024
Unified Multi-Modal Interleaved Document Representation for Information Retrieval
Jaewoo Lee
Joonho Ko
Jinheon Baek
Soyeong Jeong
Sung Ju Hwang
20
1
0
03 Oct 2024
Knowledge-Aware Reasoning over Multimodal Semi-structured Tables
Suyash Vardhan Mathur
J. Bafna
Kunal Kartik
Harshita Khandelwal
Manish Shrivastava
Vivek Gupta
Mohit Bansal
Dan Roth
LMTD
17
0
0
25 Aug 2024
MIBench: Evaluating Multimodal Large Language Models over Multiple Images
Haowei Liu
Xi Zhang
Haiyang Xu
Yaya Shi
Chaoya Jiang
...
Ji Zhang
Fei Huang
Chunfen Yuan
Bing Li
Weiming Hu
VLM
29
10
0
21 Jul 2024
EchoSight: Advancing Visual-Language Models with Wiki Knowledge
Yibin Yan
Weidi Xie
RALM
27
8
0
17 Jul 2024
Granular Privacy Control for Geolocation with Vision Language Models
Ethan Mendes
Yang Chen
James Hays
Sauvik Das
Wei-ping Xu
Alan Ritter
34
3
0
06 Jul 2024
Synthetic Multimodal Question Generation
Ian Wu
Sravan Jayanthi
Vijay Viswanathan
Simon Rosenberg
Sina Pakazad
Tongshuang Wu
Graham Neubig
26
2
0
02 Jul 2024
SK-VQA: Synthetic Knowledge Generation at Scale for Training Context-Augmented Multimodal LLMs
Xin Su
Man Luo
Kris W Pan
Tien Pei Chou
Vasudev Lal
Phillip Howard
35
3
0
28 Jun 2024
MATE: Meet At The Embedding -- Connecting Images with Long Texts
Young Kyun Jang
Junmo Kang
Yong Jae Lee
Donghyun Kim
VLM
26
5
0
26 Jun 2024
African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object Classification
Gregor Geigle
Radu Timofte
Goran Glavas
29
9
0
20 Jun 2024
Automatic benchmarking of large multimodal models via iterative experiment programming
Alessandro Conti
Enrico Fini
Paolo Rota
Yiming Wang
Massimiliano Mancini
Elisa Ricci
30
0
0
18 Jun 2024
What do MLLMs hear? Examining reasoning with text and sound components in Multimodal Large Language Models
Enis Berk Çoban
Michael I. Mandel
Johanna Devaney
AuLLM
LRM
36
0
0
07 Jun 2024
Reverse Image Retrieval Cues Parametric Memory in Multimodal LLMs
Jialiang Xu
Michael Moor
J. Leskovec
19
2
0
29 May 2024
Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs
Davide Caffagni
Federico Cocchi
Nicholas Moratelli
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
KELM
24
34
0
23 Apr 2024
DesignQA: A Multimodal Benchmark for Evaluating Large Language Models' Understanding of Engineering Documentation
Anna C. Doris
Daniele Grandi
Ryan Tomich
Md Ferdous Alam
Hyunmin Cheong
Faez Ahmed
30
14
0
11 Apr 2024
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
Kai Zhang
Yi Luan
Hexiang Hu
Kenton Lee
Siyuan Qiao
Wenhu Chen
Yu-Chuan Su
Ming-Wei Chang
VLM
LRM
31
32
0
28 Mar 2024
Quantifying and Mitigating Unimodal Biases in Multimodal Large Language Models: A Causal Perspective
Meiqi Chen
Yixin Cao
Yan Zhang
Chaochao Lu
21
12
0
27 Mar 2024
Entity6K: A Large Open-Domain Evaluation Dataset for Real-World Entity Recognition
Jielin Qiu
William Jongwon Han
Winfred Wang
Zhengyuan Yang
Linjie Li
Jianfeng Wang
Christos Faloutsos
Lei Li
Lijuan Wang
VLM
56
2
0
19 Mar 2024
SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM
Jielin Qiu
Andrea Madotto
Zhaojiang Lin
Paul A. Crook
Y. Xu
Xin Luna Dong
Christos Faloutsos
Lei Li
Babak Damavandi
Seungwhan Moon
26
8
0
07 Mar 2024
LSTP: Language-guided Spatial-Temporal Prompt Learning for Long-form Video-Text Understanding
Yuxuan Wang
Yueqian Wang
Pengfei Wu
Jianxin Liang
Dongyan Zhao
Zilong Zheng
VLM
21
9
0
25 Feb 2024
Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment
Yunxin Li
Xinyu Chen
Baotian Hu
Haoyuan Shi
Min-Ling Zhang
37
3
0
21 Feb 2024
The Revolution of Multimodal Large Language Models: A Survey
Davide Caffagni
Federico Cocchi
Luca Barsellotti
Nicholas Moratelli
Sara Sarto
Lorenzo Baraldi
Lorenzo Baraldi
Marcella Cornia
Rita Cucchiara
LRM
VLM
46
41
0
19 Feb 2024
MIKE: A New Benchmark for Fine-grained Multimodal Entity Knowledge Editing
Jiaqi Li
Miaozeng Du
Chuanyi Zhang
Yongrui Chen
Nan Hu
Guilin Qi
Haiyun Jiang
Siyuan Cheng
Bo Tian
18
14
0
18 Feb 2024
PreFLMR: Scaling Up Fine-Grained Late-Interaction Multi-modal Retrievers
Weizhe Lin
Jingbiao Mei
Jinghong Chen
Bill Byrne
VLM
AI4Ed
74
17
0
13 Feb 2024
Safety of Multimodal Large Language Models on Images and Texts
Xin Liu
Yichen Zhu
Yunshi Lan
Chao Yang
Yu Qiao
24
27
0
01 Feb 2024
Cross-modal Retrieval for Knowledge-based Visual Question Answering
Paul Lerner
Olivier Ferret
C. Guinaudeau
28
7
0
11 Jan 2024
Instruct-Imagen: Image Generation with Multi-modal Instruction
Hexiang Hu
Kelvin C. K. Chan
Yu-Chuan Su
Wenhu Chen
Yandong Li
...
Xue Ben
Boqing Gong
William W. Cohen
Ming-Wei Chang
Xuhui Jia
MLLM
33
42
0
03 Jan 2024
UniIR: Training and Benchmarking Universal Multimodal Information Retrievers
Cong Wei
Yang Chen
Haonan Chen
Hexiang Hu
Ge Zhang
Jie Fu
Alan Ritter
Wenhu Chen
28
50
0
28 Nov 2023
Fully Authentic Visual Question Answering Dataset from Online Communities
Chongyan Chen
Mengchen Liu
Noel Codella
Yunsheng Li
Lu Yuan
Danna Gurari
22
5
0
27 Nov 2023
A Comprehensive Evaluation of GPT-4V on Knowledge-Intensive Visual Question Answering
Yunxin Li
Longyue Wang
Baotian Hu
Xinyu Chen
Wanqi Zhong
Chenyang Lyu
Wei Wang
Min Zhang
ELM
20
21
0
13 Nov 2023
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Shilong Liu
Hao Cheng
Haotian Liu
Hao Zhang
Feng Li
...
Hang Su
Jun Zhu
Lei Zhang
Jianfeng Gao
Chun-yue Li
MLLM
VLM
47
102
0
09 Nov 2023
Can Language Models be Instructed to Protect Personal Information?
Yang Chen
Ethan Mendes
Sauvik Das
Wei-ping Xu
Alan Ritter
PILM
11
34
0
03 Oct 2023
1
2
Next