ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.06039
  4. Cited By
MultiModalQA: Complex Question Answering over Text, Tables and Images

MultiModalQA: Complex Question Answering over Text, Tables and Images

International Conference on Learning Representations (ICLR), 2021
13 April 2021
Alon Talmor
Ori Yoran
Amnon Catav
Dan Lahav
Yizhong Wang
Akari Asai
Gabriel Ilharco
Hannaneh Hajishirzi
Jonathan Berant
    LMTD
ArXiv (abs)PDFHTML

Papers citing "MultiModalQA: Complex Question Answering over Text, Tables and Images"

50 / 87 papers shown
WearVQA: A Visual Question Answering Benchmark for Wearables in Egocentric Authentic Real-world scenarios
WearVQA: A Visual Question Answering Benchmark for Wearables in Egocentric Authentic Real-world scenarios
Eun Chang
Z. Huang
Yiwei Liao
Sagar Ravi Bhavsar
Amogh Param
...
Babak Damavandi
Rakesh Wanga
Anuj Kumar
Rohit Patel
Xin Luna Dong
74
0
0
27 Nov 2025
Bridging the Modality Gap by Similarity Standardization with Pseudo-Positive Samples
Bridging the Modality Gap by Similarity Standardization with Pseudo-Positive Samples
Shuhei Yamashita
Daiki Shirafuji
Tatsuhiko Saito
76
0
0
27 Nov 2025
CRAG-MM: Multi-modal Multi-turn Comprehensive RAG Benchmark
CRAG-MM: Multi-modal Multi-turn Comprehensive RAG Benchmark
Jiaqi Wang
X. J. Yang
Kai Sun
Parth Suresh
Sanat Sharma
...
Rakesh Wanga
Anuj Kumar
Rohit Patel
Wen-tau Yih
Xin Luna Dong
125
2
0
30 Oct 2025
Document Intelligence in the Era of Large Language Models: A Survey
Document Intelligence in the Era of Large Language Models: A Survey
Weishi Wang
Hengchang Hu
Zhijie Zhang
Zhaochen Li
Hongxin Shao
Daniel Dahlmeier
AI4TS
188
0
0
15 Oct 2025
CFVBench: A Comprehensive Video Benchmark for Fine-grained Multimodal Retrieval-Augmented Generation
CFVBench: A Comprehensive Video Benchmark for Fine-grained Multimodal Retrieval-Augmented Generation
Kaiwen Wei
Xiao-Yang Liu
Jie Zhang
Zijian Wang
Ruida Liu
...
C. Pan
Y. Zhang
Jiang Zhong
Peijin Wang
Yingchao Feng
VGenVLM
104
0
0
10 Oct 2025
Table Question Answering in the Era of Large Language Models: A Comprehensive Survey of Tasks, Methods, and Evaluation
Table Question Answering in the Era of Large Language Models: A Comprehensive Survey of Tasks, Methods, and Evaluation
Wei Zhou
Bolei Ma
Annemarie Friedrich
Mohsen Mesgar
LMTDELM
172
0
0
08 Oct 2025
Memory-QA: Answering Recall Questions Based on Multimodal Memories
Memory-QA: Answering Recall Questions Based on Multimodal Memories
Hongda Jiang
Xinyuan Zhang
Siddhant Garg
Rishab Arora
Shiun-Zu Kuo
...
Yue Liu
Aaron Colak
Ahmed Aly
Anuj Kumar
Xin Luna Dong
169
0
0
22 Sep 2025
Visual-TableQA: Open-Domain Benchmark for Reasoning over Table Images
Visual-TableQA: Open-Domain Benchmark for Reasoning over Table Images
Boammani Aser Lompo
Marc Haraoui
LMTDReLMVLMLRM
125
1
0
09 Sep 2025
Research on Multi-hop Inference Optimization of LLM Based on MQUAKE Framework
Research on Multi-hop Inference Optimization of LLM Based on MQUAKE Framework
Zucheng Liang
Wenxin Wei
Kaijie Zhang
Hongyi Chen
49
1
0
05 Sep 2025
CMRAG: Co-modality-based visual document retrieval and question answering
CMRAG: Co-modality-based visual document retrieval and question answering
Wang Chen
Guanqiang Qi
Guanqiang Qi
Yang Li
Yang Li
Lei Sha
Deguo Xia
Jizhou Huang
209
0
0
02 Sep 2025
Mind the (Language) Gap: Towards Probing Numerical and Cross-Lingual Limits of LVLMs
Mind the (Language) Gap: Towards Probing Numerical and Cross-Lingual Limits of LVLMs
Somraj Gautam
Abhirama Subramanyam Penamakuri
Abhishek Bhandari
Gaurav Harit
LMTDLRM
266
2
0
24 Aug 2025
OMHBench: Benchmarking Balanced and Grounded Omni-Modal Multi-Hop Reasoning
OMHBench: Benchmarking Balanced and Grounded Omni-Modal Multi-Hop Reasoning
Seunghee Kim
Ingyu Bang
Seokgyu Jang
Changhyeon Kim
Sanghwan Bae
Jihun Choi
Richeng Xuan
Taeuk Kim
LRM
121
0
0
22 Aug 2025
MMAPG: A Training-Free Framework for Multimodal Multi-hop Question Answering via Adaptive Planning Graphs
MMAPG: A Training-Free Framework for Multimodal Multi-hop Question Answering via Adaptive Planning Graphs
Yiheng Hu
Xiaoyang Wang
Qing Liu
Xiwei Xu
Qian Fu
Wenjie Zhang
Liming Zhu
138
1
0
22 Aug 2025
MoNaCo: More Natural and Complex Questions for Reasoning Across Dozens of Documents
MoNaCo: More Natural and Complex Questions for Reasoning Across Dozens of Documents
Tomer Wolfson
H. Trivedi
Mor Geva
Yoav Goldberg
Dan Roth
Tushar Khot
Ashish Sabharwal
Reut Tsarfaty
RALMLRM
285
5
0
15 Aug 2025
AURA: A Fine-Grained Benchmark and Decomposed Metric for Audio-Visual Reasoning
AURA: A Fine-Grained Benchmark and Decomposed Metric for Audio-Visual Reasoning
Siminfar Samakoush Galougah
Rishie Raj
Sanjoy Chowdhury
Sayan Nag
Ramani Duraiswami
181
3
0
10 Aug 2025
Analyze-Prompt-Reason: A Collaborative Agent-Based Framework for Multi-Image Vision-Language Reasoning
Analyze-Prompt-Reason: A Collaborative Agent-Based Framework for Multi-Image Vision-Language Reasoning
Angelos Vlachos
Giorgos Filandrianos
Maria Lymperaiou
Nikolaos Spanos
Ilias Mitsouras
Vasileios Karampinis
Athanasios Voulodimos
LRM
134
0
0
01 Aug 2025
DeepSieve: Information Sieving via LLM-as-a-Knowledge-Router
DeepSieve: Information Sieving via LLM-as-a-Knowledge-Router
Minghao Guo
Qingcheng Zeng
Xujiang Zhao
Yanchi Liu
Wenchao Yu
Mengnan Du
Haifeng Chen
Wei Cheng
RALM
287
5
0
29 Jul 2025
Towards Multimodal Graph Large Language Model
Towards Multimodal Graph Large Language ModelScience China Information Sciences (Sci. China Inf. Sci.), 2025
Xin Wang
Zeyang Zhang
Linxin Xiao
Haibo Chen
Chendi Ge
Wenwu Zhu
220
0
0
11 Jun 2025
BioMol-MQA: A Multi-Modal Question Answering Dataset For LLM Reasoning Over Bio-Molecular Interactions
BioMol-MQA: A Multi-Modal Question Answering Dataset For LLM Reasoning Over Bio-Molecular Interactions
Saptarshi Sengupta
Shuhua Yang
Paul Kwong Yu
Fali Wang
Suhang Wang
207
1
0
06 Jun 2025
MMTBENCH: A Unified Benchmark for Complex Multimodal Table Reasoning
MMTBENCH: A Unified Benchmark for Complex Multimodal Table Reasoning
Prasham Yatinkumar Titiya
Jainil Trivedi
Chitta Baral
Vivek Gupta
LMTD
230
4
0
27 May 2025
POQD: Performance-Oriented Query Decomposer for Multi-vector retrieval
POQD: Performance-Oriented Query Decomposer for Multi-vector retrieval
Yaoyang Liu
Junlin Li
Yinjun Wu
Zhen Chen
313
1
0
25 May 2025
Abacus: A Cost-Based Optimizer for Semantic Operator Systems
Abacus: A Cost-Based Optimizer for Semantic Operator Systems
Matthew Russo
Om Chabra
Gerardo Vitagliano
Chunwei Liu
Tim Kraska
Samuel Madden
Michael Cafarella
347
1
0
20 May 2025
Towards Temporal-Aware Multi-Modal Retrieval Augmented Generation in Finance
Towards Temporal-Aware Multi-Modal Retrieval Augmented Generation in Finance
Fengbin Zhu
Junfeng Li
Liangming Pan
Wenjie Wang
Fuli Feng
Chao Wang
Huanbo Luan
Tat-Seng Chua
AIFin
355
2
0
07 Mar 2025
MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks
MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks
Hyeonjeong Ha
Qiusi Zhan
Jeonghwan Kim
Dimitrios Bralios
Saikrishna Sanniboina
Nanyun Peng
Kai-Wei Chang
Daniel Kang
Heng Ji
KELMAAML
385
10
0
25 Feb 2025
OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering
OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question AnsweringInternational Conference on Human Factors in Computing Systems (CHI), 2024
Jiahao Nick Li
Zhuohao Jerry Zhang
Zhang
424
6
0
24 Feb 2025
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Mohammad Mahdi Abootorabi
Amirhosein Zobeiri
Mahdi Dehghani
Mohammadali Mohammadkhani
Bardia Mohammadi
Omid Ghahroodi
M. Baghshah
Ehsaneddin Asgari
RALM
718
31
0
12 Feb 2025
RAMQA: A Unified Framework for Retrieval-Augmented Multi-Modal Question Answering
RAMQA: A Unified Framework for Retrieval-Augmented Multi-Modal Question AnsweringNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Yang Bai
Christan Earl Grant
Daisy Zhe Wang
RALM
267
5
0
23 Jan 2025
Multimodal Multihop Source Retrieval for Web Question Answering
Multimodal Multihop Source Retrieval for Web Question Answering
Navya Yarrabelly
Saloni Mittal
151
0
0
07 Jan 2025
FCMR: Robust Evaluation of Financial Cross-Modal Multi-Hop Reasoning
FCMR: Robust Evaluation of Financial Cross-Modal Multi-Hop ReasoningAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Seunghee Kim
Changhyeon Kim
Taeuk Kim
LRM
446
7
0
17 Dec 2024
Dynamic Strategy Planning for Efficient Question Answering with Large Language Models
Dynamic Strategy Planning for Efficient Question Answering with Large Language Models
Tanmay Parekh
Pradyot Prakash
Alexander Radovic
Akshay Shekher
Denis Savenkov
LRM
841
3
0
30 Oct 2024
Self-adaptive Multimodal Retrieval-Augmented Generation
Self-adaptive Multimodal Retrieval-Augmented Generation
Wenjia Zhai
VLM
191
3
0
15 Oct 2024
MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models
MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal ModelsInternational Conference on Learning Representations (ICLR), 2024
Wenbo Hu
Jia-Chen Gu
Zi-Yi Dou
Mohsen Fayyaz
Pan Lu
Kai-Wei Chang
Nanyun Peng
VLM
353
29
0
10 Oct 2024
MuRAR: A Simple and Effective Multimodal Retrieval and Answer Refinement Framework for Multimodal Question Answering
MuRAR: A Simple and Effective Multimodal Retrieval and Answer Refinement Framework for Multimodal Question AnsweringInternational Conference on Computational Linguistics (COLING), 2024
Zhengyuan Zhu
Daniel Lee
Hong Zhang
Sai Sree Harsha
Loic Feujio
Akash Maharaj
Yunyao Li
171
6
0
16 Aug 2024
FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research
FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research
Jiajie Jin
Yutao Zhu
Xinyu Yang
Chenghao Zhang
Zhicheng Dou
Chenghao Zhang
Tong Zhao
Zhao Yang
Zhicheng Dou
Ji-Rong Wen
VLM
405
139
0
22 May 2024
RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question
  Answering and Clinical Reasoning
RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning
Congyun Jin
Ming Zhang
Xiaowei Ma
Yujiao Li
Yingbo Wang
...
Chenfei Chi
Xiangguo Lv
Fangzhou Li
Wei Xue
Yiran Huang
LM&MA
181
10
0
19 Feb 2024
Evaluating LLMs' Mathematical Reasoning in Financial Document Question Answering
Evaluating LLMs' Mathematical Reasoning in Financial Document Question Answering
Pragya Srivastava
Manuj Malik
Vivek Gupta
T. Ganu
Dan Roth
259
39
0
17 Feb 2024
Text-to-Image Cross-Modal Generation: A Systematic Review
Text-to-Image Cross-Modal Generation: A Systematic Review
Maciej Żelaszczyk
Jacek Mańdziuk
320
6
0
21 Jan 2024
MMToM-QA: Multimodal Theory of Mind Question Answering
MMToM-QA: Multimodal Theory of Mind Question AnsweringAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Chuanyang Jin
Yutong Wu
Jing Cao
Jiannan Xiang
Yen-Ling Kuo
Zhiting Hu
T. Ullman
Antonio Torralba
Joshua B. Tenenbaum
Tianmin Shu
310
68
0
16 Jan 2024
DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain
  Question Answering over Knowledge Base and Text
DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain Question Answering over Knowledge Base and Text
Wenting Zhao
Ye Liu
Tong Niu
Yao Wan
Philip S. Yu
Shafiq Joty
Yingbo Zhou
Semih Yavuz
LRM
207
9
0
31 Oct 2023
Progressive Evidence Refinement for Open-domain Multimodal Retrieval
  Question Answering
Progressive Evidence Refinement for Open-domain Multimodal Retrieval Question Answering
Shuwen Yang
Anran Wu
Xingjiao Wu
Luwei Xiao
Tianlong Ma
Cheng Jin
Liang He
210
7
0
15 Oct 2023
Through the Lens of Core Competency: Survey on Evaluation of Large
  Language Models
Through the Lens of Core Competency: Survey on Evaluation of Large Language ModelsChina National Conference on Chinese Computational Linguistics (CNCCL), 2023
Ziyu Zhuang
Qiguang Chen
Longxuan Ma
Mingda Li
Yi Han
Yushan Qian
Haopeng Bai
Zixian Feng
Weinan Zhang
Ting Liu
ELM
193
23
0
15 Aug 2023
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative
  Instructions
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative InstructionsInternational Conference on Learning Representations (ICLR), 2023
Juncheng Li
Kaihang Pan
Zhiqi Ge
Minghe Gao
Wei Ji
Wenqiao Zhang
Tat-Seng Chua
Siliang Tang
Hanwang Zhang
Yueting Zhuang
MLLM
315
89
0
08 Aug 2023
DialogStudio: Towards Richest and Most Diverse Unified Dataset
  Collection for Conversational AI
DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AIFindings (Findings), 2023
Jianguo Zhang
Kun Qian
Zhiwei Liu
Shelby Heinecke
Rui Meng
Ye Liu
Zhou Yu
Huan Wang
Silvio Savarese
Caiming Xiong
306
29
0
19 Jul 2023
Read, Look or Listen? What's Needed for Solving a Multimodal Dataset
Read, Look or Listen? What's Needed for Solving a Multimodal Dataset
Netta Madvil
Yonatan Bitton
Roy Schwartz
216
3
0
06 Jul 2023
Answer Mining from a Pool of Images: Towards Retrieval-Based Visual
  Question Answering
Answer Mining from a Pool of Images: Towards Retrieval-Based Visual Question AnsweringInternational Joint Conference on Artificial Intelligence (IJCAI), 2023
A. S. Penamakuri
Manish Gupta
Mithun Das Gupta
Anand Mishra
166
7
0
29 Jun 2023
Pre-Training Multi-Modal Dense Retrievers for Outside-Knowledge Visual
  Question Answering
Pre-Training Multi-Modal Dense Retrievers for Outside-Knowledge Visual Question AnsweringInternational Conference on the Theory of Information Retrieval (ICTIR), 2023
Alireza Salemi
Mahta Rafiee
Hamed Zamani
173
13
0
28 Jun 2023
Table Meets LLM: Can Large Language Models Understand Structured Table
  Data? A Benchmark and Empirical Study
Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical StudyWeb Search and Data Mining (WSDM), 2023
Yuan Sui
Mengyu Zhou
Mingjie Zhou
Shi Han
Dongmei Zhang
LMTD
374
156
0
22 May 2023
MPMQA: Multimodal Question Answering on Product Manuals
MPMQA: Multimodal Question Answering on Product ManualsAAAI Conference on Artificial Intelligence (AAAI), 2023
Liangfu Zhang
Anwen Hu
Jing Zhang
Shuo Hu
Qin Jin
193
14
0
19 Apr 2023
VTQA: Visual Text Question Answering via Entity Alignment and
  Cross-Media Reasoning
VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media ReasoningComputer Vision and Pattern Recognition (CVPR), 2023
Kan Chen
Xiangqian Wu
CoGe
159
19
0
05 Mar 2023
Complex QA and language models hybrid architectures, Survey
Complex QA and language models hybrid architectures, Survey
Xavier Daull
P. Bellot
Emmanuel Bruno
Vincent Martin
Elisabeth Murisasco
ELM
683
17
0
17 Feb 2023
12
Next