MultiModalQA: Complex Question Answering over Text, Tables and Images

International Conference on Learning Representations (ICLR), 2021

13 April 2021

Papers citing "MultiModalQA: Complex Question Answering over Text, Tables and Images"

50 / 87 papers shown

WearVQA: A Visual Question Answering Benchmark for Wearables in Egocentric Authentic Real-world scenarios

...

27 Nov 2025

Bridging the Modality Gap by Similarity Standardization with Pseudo-Positive Samples

Shuhei Yamashita

Daiki Shirafuji

Tatsuhiko Saito

27 Nov 2025

CRAG-MM: Multi-modal Multi-turn Comprehensive RAG Benchmark

...

125

30 Oct 2025

Document Intelligence in the Era of Large Language Models: A Survey

188

15 Oct 2025

CFVBench: A Comprehensive Video Benchmark for Fine-grained Multimodal Retrieval-Augmented Generation

...

104

10 Oct 2025

Table Question Answering in the Era of Large Language Models: A Comprehensive Survey of Tasks, Methods, and Evaluation

172

08 Oct 2025

Memory-QA: Answering Recall Questions Based on Multimodal Memories

...

169

22 Sep 2025

Visual-TableQA: Open-Domain Benchmark for Reasoning over Table Images

Boammani Aser Lompo

Marc Haraoui

LMTD ReLM VLM LRM

125

09 Sep 2025

Research on Multi-hop Inference Optimization of LLM Based on MQUAKE Framework

05 Sep 2025

CMRAG: Co-modality-based visual document retrieval and question answering

209

02 Sep 2025

Mind the (Language) Gap: Towards Probing Numerical and Cross-Lingual Limits of LVLMs

Somraj Gautam

Abhirama Subramanyam Penamakuri

Abhishek Bhandari

Gaurav Harit

LMTD LRM

266

24 Aug 2025

OMHBench: Benchmarking Balanced and Grounded Omni-Modal Multi-Hop Reasoning

121

22 Aug 2025

MMAPG: A Training-Free Framework for Multimodal Multi-hop Question Answering via Adaptive Planning Graphs

138

22 Aug 2025

MoNaCo: More Natural and Complex Questions for Reasoning Across Dozens of Documents

285

15 Aug 2025

AURA: A Fine-Grained Benchmark and Decomposed Metric for Audio-Visual Reasoning

Siminfar Samakoush Galougah

181

10 Aug 2025

Analyze-Prompt-Reason: A Collaborative Agent-Based Framework for Multi-Image Vision-Language Reasoning

Athanasios Voulodimos

LRM

134

01 Aug 2025

DeepSieve: Information Sieving via LLM-as-a-Knowledge-Router

287

29 Jul 2025

Towards Multimodal Graph Large Language ModelScience China Information Sciences (Sci. China Inf. Sci.), 2025

220

11 Jun 2025

BioMol-MQA: A Multi-Modal Question Answering Dataset For LLM Reasoning Over Bio-Molecular Interactions

207

06 Jun 2025

MMTBENCH: A Unified Benchmark for Complex Multimodal Table Reasoning

Prasham Yatinkumar Titiya

230

27 May 2025

POQD: Performance-Oriented Query Decomposer for Multi-vector retrieval

313

25 May 2025

Abacus: A Cost-Based Optimizer for Semantic Operator Systems

347

20 May 2025

Towards Temporal-Aware Multi-Modal Retrieval Augmented Generation in Finance

355

07 Mar 2025

MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks

Saikrishna Sanniboina

385

25 Feb 2025

OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question AnsweringInternational Conference on Human Factors in Computing Systems (CHI), 2024

Jiahao Nick Li

Zhuohao Jerry Zhang

Zhang

424

24 Feb 2025

Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

Mohammad Mahdi Abootorabi

Amirhosein Zobeiri

Mahdi Dehghani

Mohammadali Mohammadkhani

718

12 Feb 2025

RAMQA: A Unified Framework for Retrieval-Augmented Multi-Modal Question AnsweringNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025

267

23 Jan 2025

Multimodal Multihop Source Retrieval for Web Question Answering

Navya Yarrabelly

Saloni Mittal

151

07 Jan 2025

FCMR: Robust Evaluation of Financial Cross-Modal Multi-Hop ReasoningAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

446

17 Dec 2024

Dynamic Strategy Planning for Efficient Question Answering with Large Language Models

841

30 Oct 2024

Self-adaptive Multimodal Retrieval-Augmented Generation

Wenjia Zhai

VLM

191

15 Oct 2024

MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal ModelsInternational Conference on Learning Representations (ICLR), 2024

Pan Lu

Kai-Wei Chang

Nanyun Peng

VLM

353

10 Oct 2024

MuRAR: A Simple and Effective Multimodal Retrieval and Answer Refinement Framework for Multimodal Question AnsweringInternational Conference on Computational Linguistics (COLING), 2024

Daniel Lee

Yunyao Li

171

16 Aug 2024

FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research

Jiajie Jin

Chenghao Zhang

Tong Zhao

Zhao Yang

Zhicheng Dou

Ji-Rong Wen

VLM

405

139

22 May 2024

RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning

...

181

19 Feb 2024

Evaluating LLMs' Mathematical Reasoning in Financial Document Question Answering

259

17 Feb 2024

Text-to-Image Cross-Modal Generation: A Systematic Review

Maciej Żelaszczyk

Jacek Mańdziuk

320

21 Jan 2024

MMToM-QA: Multimodal Theory of Mind Question AnsweringAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Joshua B. Tenenbaum

310

16 Jan 2024

DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain Question Answering over Knowledge Base and Text

Philip S. Yu

Yingbo Zhou

207

31 Oct 2023

Progressive Evidence Refinement for Open-domain Multimodal Retrieval Question Answering

Xingjiao Wu

210

15 Oct 2023

Through the Lens of Core Competency: Survey on Evaluation of Large Language ModelsChina National Conference on Chinese Computational Linguistics (CNCCL), 2023

193

15 Aug 2023

Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative InstructionsInternational Conference on Learning Representations (ICLR), 2023

Wei Ji

315

08 Aug 2023

DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AIFindings (Findings), 2023

Kun Qian

Huan Wang

Silvio Savarese

Caiming Xiong

306

19 Jul 2023

Read, Look or Listen? What's Needed for Solving a Multimodal Dataset

Netta Madvil

Yonatan Bitton

Roy Schwartz

216

06 Jul 2023

Answer Mining from a Pool of Images: Towards Retrieval-Based Visual Question AnsweringInternational Joint Conference on Artificial Intelligence (IJCAI), 2023

166

29 Jun 2023

Pre-Training Multi-Modal Dense Retrievers for Outside-Knowledge Visual Question AnsweringInternational Conference on the Theory of Information Retrieval (ICTIR), 2023

Alireza Salemi

Mahta Rafiee

Hamed Zamani

173

28 Jun 2023

Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical StudyWeb Search and Data Mining (WSDM), 2023

Yuan Sui

374

156

22 May 2023

MPMQA: Multimodal Question Answering on Product ManualsAAAI Conference on Artificial Intelligence (AAAI), 2023

Liangfu Zhang

Anwen Hu

Jing Zhang

Shuo Hu

Qin Jin

193

19 Apr 2023

VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media ReasoningComputer Vision and Pattern Recognition (CVPR), 2023

Kan Chen

Xiangqian Wu

CoGe

159

05 Mar 2023

Complex QA and language models hybrid architectures, Survey

683

17 Feb 2023