ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.11713
  4. Cited By
Can Pre-trained Vision and Language Models Answer Visual
  Information-Seeking Questions?
v1v2v3v4v5 (latest)

Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions?

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
23 February 2023
Yang Chen
Hexiang Hu
Yi Luan
Haitian Sun
Soravit Changpinyo
Alan Ritter
Ming-Wei Chang
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)

Papers citing "Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions?"

50 / 58 papers shown
Breaking the Visual Shortcuts in Multimodal Knowledge-Based Visual Question Answering
Breaking the Visual Shortcuts in Multimodal Knowledge-Based Visual Question Answering
Dosung Lee
Sangwon Jung
Boyoung Kim
Minyoung Kim
Sungyeon Kim
Junyoung Sung
Paul Hongsuck Seo
161
0
0
28 Nov 2025
ReAG: Reasoning-Augmented Generation for Knowledge-based Visual Question Answering
ReAG: Reasoning-Augmented Generation for Knowledge-based Visual Question Answering
Alberto Compagnoni
Marco Morini
Sara Sarto
Federico Cocchi
Davide Caffagni
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
RALMLRM
255
2
0
27 Nov 2025
SciEGQA: A Dataset for Scientific Evidence-Grounded Question Answering and Reasoning
SciEGQA: A Dataset for Scientific Evidence-Grounded Question Answering and Reasoning
Wenhan Yu
Wang Chen
Guanqiang Qi
Weikang Li
Yang Li
Lei Sha
Deguo Xia
Jizhou Huang
211
4
0
19 Nov 2025
HV-Attack: Hierarchical Visual Attack for Multimodal Retrieval Augmented Generation
HV-Attack: Hierarchical Visual Attack for Multimodal Retrieval Augmented Generation
Linyin Luo
Yujuan Ding
Yunshan Ma
Wenqi Fan
Hanjiang Lai
AAML
274
1
0
19 Nov 2025
DeepEyesV2: Toward Agentic Multimodal Model
DeepEyesV2: Toward Agentic Multimodal ModelIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025
Jack Hong
Chenxiao Zhao
ChengLin Zhu
Weiheng Lu
Guohai Xu
Xing Yu
181
30
0
07 Nov 2025
Unified Reinforcement and Imitation Learning for Vision-Language Models
Unified Reinforcement and Imitation Learning for Vision-Language Models
Byung-Kwan Lee
Ryo Hachiuma
Yong Man Ro
Yu-Chun Wang
Yueh-Hua Wu
VLM
211
4
0
22 Oct 2025
Exploring a Unified Vision-Centric Contrastive Alternatives on Multi-Modal Web Documents
Exploring a Unified Vision-Centric Contrastive Alternatives on Multi-Modal Web Documents
Yiqi Lin
Alex Jinpeng Wang
Linjie Li
Zhengyuan Yang
Mike Zheng Shou
165
1
0
21 Oct 2025
A Comprehensive Survey on Reinforcement Learning-based Agentic Search: Foundations, Roles, Optimizations, Evaluations, and Applications
A Comprehensive Survey on Reinforcement Learning-based Agentic Search: Foundations, Roles, Optimizations, Evaluations, and Applications
Minhua Lin
Zongyu Wu
Zhichao Xu
Hui Liu
Xianfeng Tang
Qi He
Charu C. Aggarwal
Hui Liu
Xiang Zhang
Suhang Wang
AI4TSLRM
625
9
0
19 Oct 2025
Knowledge-based Visual Question Answer with Multimodal Processing, Retrieval and Filtering
Knowledge-based Visual Question Answer with Multimodal Processing, Retrieval and Filtering
Yuyang Hong
Jiaqi Gu
Qi Yang
Lubin Fan
Yue-bo Wu
Ying Wang
Kun Ding
Shiming Xiang
Jieping Ye
256
9
0
16 Oct 2025
NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching
NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching
Run Luo
Xiaobo Xia
Lu Wang
Longze Chen
Renke Shan
Jing Luo
Min Yang
Tat-Seng Chua
VGen
305
9
0
15 Oct 2025
DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search
DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search
Kartik Narayan
Yang Xu
Tian Cao
Kavya Nerella
Vishal M. Patel
Navid Shiee
Peter Grasch
Chao Jia
Yinfei Yang
Zhe Gan
ObjDKELMVLM
295
16
0
14 Oct 2025
CFVBench: A Comprehensive Video Benchmark for Fine-grained Multimodal Retrieval-Augmented Generation
CFVBench: A Comprehensive Video Benchmark for Fine-grained Multimodal Retrieval-Augmented Generation
Kaiwen Wei
Xiao-Yang Liu
Jie Zhang
Zijian Wang
Ruida Liu
...
C. Pan
Y. Zhang
Jiang Zhong
Peijin Wang
Yingchao Feng
VGenVLM
156
0
0
10 Oct 2025
MRMR: A Realistic and Expert-Level Multidisciplinary Benchmark for Reasoning-Intensive Multimodal Retrieval
MRMR: A Realistic and Expert-Level Multidisciplinary Benchmark for Reasoning-Intensive Multimodal Retrieval
Siyue Zhang
Yuan Gao
Xiao Zhou
Yilun Zhao
Tingyu Song
Arman Cohan
Anh Tuan Luu
Chen Zhao
VLMLRM
175
1
0
10 Oct 2025
Retrv-R1: A Reasoning-Driven MLLM Framework for Universal and Efficient Multimodal Retrieval
Retrv-R1: A Reasoning-Driven MLLM Framework for Universal and Efficient Multimodal Retrieval
Lanyun Zhu
Deyi Ji
Tianrun Chen
Haiyang Wu
Shiqi Wang
LRM
230
4
0
03 Oct 2025
Generalized Contrastive Learning for Universal Multimodal Retrieval
Generalized Contrastive Learning for Universal Multimodal Retrieval
Jungsoo Lee
Janghoon Cho
Hyojin Park
Munawar Hayat
Kyuwoong Hwang
Fatih Porikli
Sungha Choi
VLM
226
4
0
30 Sep 2025
From Perception to Cognition: A Survey of Vision-Language Interactive Reasoning in Multimodal Large Language Models
From Perception to Cognition: A Survey of Vision-Language Interactive Reasoning in Multimodal Large Language Models
Chenyue Zhou
Mingxuan Wang
Yanbiao Ma
Chenxu Wu
Wanyi Chen
...
Guoli Jia
Lingling Li
Z. Lu
Y. Lu
Wenhan Luo
LRM
611
13
0
29 Sep 2025
Recurrence Meets Transformers for Universal Multimodal Retrieval
Recurrence Meets Transformers for Universal Multimodal Retrieval
Davide Caffagni
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
274
2
0
10 Sep 2025
Global-to-Local or Local-to-Global? Enhancing Image Retrieval with Efficient Local Search and Effective Global Re-ranking
Global-to-Local or Local-to-Global? Enhancing Image Retrieval with Efficient Local Search and Effective Global Re-ranking
Dror Aiger
Bingyi Cao
Kaifeng Chen
A. Araújo
236
1
0
04 Sep 2025
CMRAG: Co-modality-based visual document retrieval and question answering
CMRAG: Co-modality-based visual document retrieval and question answering
Wang Chen
Guanqiang Qi
Guanqiang Qi
Yang Li
Yang Li
Lei Sha
Deguo Xia
Jizhou Huang
288
0
0
02 Sep 2025
Multimodal Iterative RAG for Knowledge-Intensive Visual Question Answering
Multimodal Iterative RAG for Knowledge-Intensive Visual Question Answering
Changin Choi
Wonseok Lee
Jungmin Ko
Wonjong Rhee
VLMLRM
353
0
0
31 Aug 2025
mKG-RAG: Multimodal Knowledge Graph-Enhanced RAG for Visual Question Answering
mKG-RAG: Multimodal Knowledge Graph-Enhanced RAG for Visual Question Answering
Xu Yuan
Liangbo Ning
Wenqi Fan
Qing Li
236
9
0
07 Aug 2025
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent
Xin Guan
Peng Xia
Zhen Zhang
Xinyu Wang
Qiuchen Wang
...
Kuan Li
Yong Jiang
Pengjun Xie
Fei Huang
Jingren Zhou
387
54
0
07 Aug 2025
On The Role of Pretrained Language Models in General-Purpose Text Embeddings: A Survey
On The Role of Pretrained Language Models in General-Purpose Text Embeddings: A Survey
Meishan Zhang
Xin Zhang
X. Zhao
Shouzheng Huang
Baotian Hu
Min Zhang
352
4
0
28 Jul 2025
Augmented Vision-Language Models: A Systematic Review
Augmented Vision-Language Models: A Systematic Review
Anthony C Davis
Burhan Sadiq
Tianmin Shu
Chien-Ming Huang
VLMLRM
219
0
0
24 Jul 2025
Taming the Untamed: Graph-Based Knowledge Retrieval and Reasoning for MLLMs to Conquer the Unknown
Taming the Untamed: Graph-Based Knowledge Retrieval and Reasoning for MLLMs to Conquer the Unknown
Bowen Wang
Zhouqiang Jiang
Yasuaki Susumu
Shotaro Miwa
Tianwei Chen
Yuta Nakashima
404
1
0
21 Jun 2025
GenRecal: Generation after Recalibration from Large to Small Vision-Language Models
GenRecal: Generation after Recalibration from Large to Small Vision-Language Models
Byung-Kwan Lee
Ryo Hachiuma
Yong Man Ro
Yu-Chun Wang
Yueh-Hua Wu
VLM
373
4
0
18 Jun 2025
CoRe-MMRAG: Cross-Source Knowledge Reconciliation for Multimodal RAG
CoRe-MMRAG: Cross-Source Knowledge Reconciliation for Multimodal RAGAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yang Tian
Fan Liu
Jingyuan Zhang
Victoria A. Webster-Wood
Yupeng Hu
Liqiang Nie
VLM
286
13
0
03 Jun 2025
mRAG: Elucidating the Design Space of Multi-modal Retrieval-Augmented Generation
mRAG: Elucidating the Design Space of Multi-modal Retrieval-Augmented Generation
Chan-wei Hu
Yueqi Wang
Shuo Xing
Chia-Ju Chen
Zhengzhong Tu
Ryan Rossi
Zhengzhong Tu
3DV
432
2
0
29 May 2025
Spa-VLM: Stealthy Poisoning Attacks on RAG-based VLM
Spa-VLM: Stealthy Poisoning Attacks on RAG-based VLM
Lei Yu
Yechao Zhang
Ziqi Zhou
Yang Wu
Wei Wan
Minghui Li
Shengshan Hu
Pei Xiaobing
Jing Wang
AAML
293
3
0
28 May 2025
Learning to Route Queries Across Knowledge Bases for Step-wise Retrieval-Augmented Reasoning
Learning to Route Queries Across Knowledge Bases for Step-wise Retrieval-Augmented Reasoning
Chunyi Peng
Zhipeng Xu
Zhenghao Liu
Yishan Li
Shi Yu
...
Zhiyuan Liu
Yu Gu
Minghe Yu
Ge Yu
Maosong Sun
LRM
341
3
0
28 May 2025
MMTBENCH: A Unified Benchmark for Complex Multimodal Table Reasoning
MMTBENCH: A Unified Benchmark for Complex Multimodal Table Reasoning
Prasham Yatinkumar Titiya
Jainil Trivedi
Chitta Baral
Vivek Gupta
LMTD
268
8
0
27 May 2025
OMGM: Orchestrate Multiple Granularities and Modalities for Efficient Multimodal Retrieval
OMGM: Orchestrate Multiple Granularities and Modalities for Efficient Multimodal RetrievalAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Wei Yang
Jingjing Fu
Rongpin Wang
Jinyu Wang
Lei Song
Jiang Bian
441
7
0
10 May 2025
MIEB: Massive Image Embedding Benchmark
MIEB: Massive Image Embedding Benchmark
Chenghao Xiao
Isaac Chung
Imene Kerboua
Jamie Stirling
Xin Zhang
Márton Kardos
Roman Solomatin
Noura Al Moubayed
Kenneth Enevoldsen
Niklas Muennighoff
VLM
576
7
0
14 Apr 2025
Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook
Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook
Xu Zheng
Ziqiao Weng
Yuanhuiyi Lyu
Lutao Jiang
Haiwei Xue
Bin Ren
Danda Pani Paudel
Andrii Zadaianchuk
Luc Van Gool
Xuming Hu
3DV
413
35
0
23 Mar 2025
Poisoned-MRAG: Knowledge Poisoning Attacks to Multimodal Retrieval Augmented Generation
Poisoned-MRAG: Knowledge Poisoning Attacks to Multimodal Retrieval Augmented Generation
Yinuo Liu
Zenghui Yuan
Guiyao Tie
Jiawen Shi
Lichao Sun
Lichao Sun
Neil Zhenqiang Gong
620
7
0
08 Mar 2025
Fine-Grained Knowledge Structuring and Retrieval for Visual Question Answering
Fine-Grained Knowledge Structuring and Retrieval for Visual Question Answering
Zhengxuan Zhang
Yin Wu
Yuyu Luo
Nan Tang
435
0
0
28 Feb 2025
Joint Fusion and Encoding: Advancing Multimodal Retrieval from the Ground Up
Joint Fusion and Encoding: Advancing Multimodal Retrieval from the Ground Up
Lang Huang
Qiyu Wu
Zhongtao Miao
T. Yamasaki
1.0K
6
0
27 Feb 2025
Detecting Knowledge Boundary of Vision Large Language Models by Sampling-Based Inference
Detecting Knowledge Boundary of Vision Large Language Models by Sampling-Based Inference
Zhuo Chen
Xinyu Wang
Yong Jiang
Zhen Zhang
Xin Guan
Pengjun Xie
Fei Huang
Kewei Tu
500
6
0
25 Feb 2025
Visual-RAG: Benchmarking Text-to-Image Retrieval Augmented Generation for Visual Knowledge Intensive Queries
Visual-RAG: Benchmarking Text-to-Image Retrieval Augmented Generation for Visual Knowledge Intensive Queries
Yin Wu
Quanyu Long
Jing Li
Jianfei Yu
Wenya Wang
VLM
334
14
0
23 Feb 2025
Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines
Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search EnginesAAAI Conference on Artificial Intelligence (AAAI), 2025
Xinwei Long
Zhiyuan Ma
Ermo Hua
Kaiyan Zhang
Biqing Qi
Bowen Zhou
RALM
410
14
0
23 Feb 2025
LOVA3: Learning to Visual Question Answering, Asking and Assessment
LOVA3: Learning to Visual Question Answering, Asking and AssessmentNeural Information Processing Systems (NeurIPS), 2024
Henry Hengyuan Zhao
Pan Zhou
Difei Gao
Zechen Bai
Mike Zheng Shou
454
17
0
21 Feb 2025
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Mohammad Mahdi Abootorabi
Amirhosein Zobeiri
Mahdi Dehghani
Mohammadali Mohammadkhani
Bardia Mohammadi
Omid Ghahroodi
M. Baghshah
Ehsaneddin Asgari
RALM
817
43
0
12 Feb 2025
Generative Landmarks Guided Eyeglasses Removal 3D Face Reconstruction
Generative Landmarks Guided Eyeglasses Removal 3D Face Reconstruction
Dapeng Zhao
Yue Qi
3DHCVBM3DV
369
1
0
31 Dec 2024
GME: Improving Universal Multimodal Retrieval by Multimodal LLMs
GME: Improving Universal Multimodal Retrieval by Multimodal LLMs
Xin Zhang
Yanzhao Zhang
Wen Xie
Mingxin Li
Ziqi Dai
Dingkun Long
Pengjun Xie
Meishan Zhang
Wenjie Li
Hao Fei
613
108
0
22 Dec 2024
Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models
Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Ido Cohen
Daniela Gottesman
Mor Geva
Raja Giryes
VLM
532
6
1
18 Dec 2024
Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent
Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning AgentInternational Conference on Learning Representations (ICLR), 2024
Yangning Li
Hai-Tao Zheng
Xinyu Wang
Yong Jiang
Zhen Zhang
...
Hui Wang
Hai-Tao Zheng
Pengjun Xie
Philip S. Yu
Fei Huang
717
64
0
05 Nov 2024
MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs
MM-Embed: Universal Multimodal Retrieval with Multimodal LLMsInternational Conference on Learning Representations (ICLR), 2024
Sheng-Chieh Lin
Chankyu Lee
Mohammad Shoeybi
Jimmy J. Lin
Bryan Catanzaro
Ming-Yu Liu
1.0K
102
0
04 Nov 2024
MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models
MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal ModelsInternational Conference on Learning Representations (ICLR), 2024
Wenbo Hu
Jia-Chen Gu
Zi-Yi Dou
Mohsen Fayyaz
Pan Lu
Kai-Wei Chang
Nanyun Peng
VLM
395
36
0
10 Oct 2024
EchoSight: Advancing Visual-Language Models with Wiki Knowledge
EchoSight: Advancing Visual-Language Models with Wiki Knowledge
Yibin Yan
Weidi Xie
RALM
376
47
0
17 Jul 2024
SK-VQA: Synthetic Knowledge Generation at Scale for Training Context-Augmented Multimodal LLMs
SK-VQA: Synthetic Knowledge Generation at Scale for Training Context-Augmented Multimodal LLMs
Xin Su
Man Luo
Kris W Pan
Tien Pei Chou
Vasudev Lal
Phillip Howard
383
7
0
28 Jun 2024
12
Next
Page 1 of 2