Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2104.06039
Cited By
MultiModalQA: Complex Question Answering over Text, Tables and Images
International Conference on Learning Representations (ICLR), 2021
13 April 2021
Alon Talmor
Ori Yoran
Amnon Catav
Dan Lahav
Yizhong Wang
Akari Asai
Gabriel Ilharco
Hannaneh Hajishirzi
Jonathan Berant
LMTD
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"MultiModalQA: Complex Question Answering over Text, Tables and Images"
50 / 80 papers shown
Title
CRAG-MM: Multi-modal Multi-turn Comprehensive RAG Benchmark
Jiaqi Wang
X. J. Yang
Kai Sun
Parth Suresh
Sanat Sharma
...
Rakesh Wanga
Anuj Kumar
Rohit Patel
Wen-tau Yih
Xin Luna Dong
100
0
0
30 Oct 2025
Document Intelligence in the Era of Large Language Models: A Survey
Weishi Wang
Hengchang Hu
Zhijie Zhang
Zhaochen Li
Hongxin Shao
Daniel Dahlmeier
AI4TS
108
0
0
15 Oct 2025
CFVBench: A Comprehensive Video Benchmark for Fine-grained Multimodal Retrieval-Augmented Generation
Kaiwen Wei
Xiao-Yang Liu
Jie Zhang
Zijian Wang
Ruida Liu
...
C. Pan
Y. Zhang
Jiang Zhong
Peijin Wang
Yingchao Feng
VGen
VLM
76
0
0
10 Oct 2025
Table Question Answering in the Era of Large Language Models: A Comprehensive Survey of Tasks, Methods, and Evaluation
Wei Zhou
Bolei Ma
Annemarie Friedrich
Mohsen Mesgar
LMTD
ELM
121
0
0
08 Oct 2025
Memory-QA: Answering Recall Questions Based on Multimodal Memories
Hongda Jiang
Xinyuan Zhang
Siddhant Garg
Rishab Arora
Shiun-Zu Kuo
...
Yue Liu
Aaron Colak
Ahmed Aly
Anuj Kumar
Xin Luna Dong
123
0
0
22 Sep 2025
Visual-TableQA: Open-Domain Benchmark for Reasoning over Table Images
Boammani Aser Lompo
Marc Haraoui
LMTD
ReLM
VLM
LRM
103
1
0
09 Sep 2025
Research on Multi-hop Inference Optimization of LLM Based on MQUAKE Framework
Zucheng Liang
Wenxin Wei
Kaijie Zhang
Hongyi Chen
13
0
0
05 Sep 2025
CMRAG: Co-modality-based visual document retrieval and question answering
Wang Chen
Guanqiang Qi
Guanqiang Qi
Yang Li
Yang Li
Lei Sha
Deguo Xia
Jizhou Huang
149
0
0
02 Sep 2025
Mind the (Language) Gap: Towards Probing Numerical and Cross-Lingual Limits of LVLMs
Somraj Gautam
Abhirama Subramanyam Penamakuri
Abhishek Bhandari
Gaurav Harit
LMTD
LRM
186
2
0
24 Aug 2025
MMAPG: A Training-Free Framework for Multimodal Multi-hop Question Answering via Adaptive Planning Graphs
Yiheng Hu
Xiaoyang Wang
Qing Liu
Xiwei Xu
Qian Fu
Wenjie Zhang
Liming Zhu
100
1
0
22 Aug 2025
CMR-SPB: Cross-Modal Multi-Hop Reasoning over Text, Image, and Speech with Path Balance
Seunghee Kim
Ingyu Bang
Seokgyu Jang
Changhyeon Kim
Sanghwan Bae
Jihun Choi
Richeng Xuan
Taeuk Kim
LRM
68
0
0
22 Aug 2025
MoNaCo: More Natural and Complex Questions for Reasoning Across Dozens of Documents
Tomer Wolfson
H. Trivedi
Mor Geva
Yoav Goldberg
Dan Roth
Tushar Khot
Ashish Sabharwal
Reut Tsarfaty
RALM
LRM
249
5
0
15 Aug 2025
AURA: A Fine-Grained Benchmark and Decomposed Metric for Audio-Visual Reasoning
Siminfar Samakoush Galougah
Rishie Raj
Sanjoy Chowdhury
Sayan Nag
Ramani Duraiswami
124
1
0
10 Aug 2025
Analyze-Prompt-Reason: A Collaborative Agent-Based Framework for Multi-Image Vision-Language Reasoning
Angelos Vlachos
Giorgos Filandrianos
Maria Lymperaiou
Nikolaos Spanos
Ilias Mitsouras
Vasileios Karampinis
Athanasios Voulodimos
LRM
96
0
0
01 Aug 2025
DeepSieve: Information Sieving via LLM-as-a-Knowledge-Router
Minghao Guo
Qingcheng Zeng
Xujiang Zhao
Yanchi Liu
Wenchao Yu
Mengnan Du
Haifeng Chen
Wei Cheng
RALM
235
4
0
29 Jul 2025
Towards Multimodal Graph Large Language Model
Science China Information Sciences (Sci. China Inf. Sci.), 2025
Xin Wang
Zeyang Zhang
Linxin Xiao
Haibo Chen
Chendi Ge
Wenwu Zhu
143
0
0
11 Jun 2025
BioMol-MQA: A Multi-Modal Question Answering Dataset For LLM Reasoning Over Bio-Molecular Interactions
Saptarshi Sengupta
Shuhua Yang
Paul Kwong Yu
Fali Wang
Suhang Wang
173
1
0
06 Jun 2025
MMTBENCH: A Unified Benchmark for Complex Multimodal Table Reasoning
Prasham Yatinkumar Titiya
Jainil Trivedi
Chitta Baral
Vivek Gupta
LMTD
210
3
0
27 May 2025
POQD: Performance-Oriented Query Decomposer for Multi-vector retrieval
Yaoyang Liu
Junlin Li
Yinjun Wu
Zhen Chen
253
1
0
25 May 2025
Abacus: A Cost-Based Optimizer for Semantic Operator Systems
Matthew Russo
Om Chabra
Gerardo Vitagliano
Chunwei Liu
Tim Kraska
Samuel Madden
Michael Cafarella
282
1
0
20 May 2025
Towards Temporal-Aware Multi-Modal Retrieval Augmented Generation in Finance
Fengbin Zhu
Junfeng Li
Liangming Pan
Wenjie Wang
Fuli Feng
Chao Wang
Huanbo Luan
Tat-Seng Chua
AIFin
316
2
0
07 Mar 2025
MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks
Hyeonjeong Ha
Qiusi Zhan
Jeonghwan Kim
Dimitrios Bralios
Saikrishna Sanniboina
Nanyun Peng
Kai-Wei Chang
Daniel Kang
Heng Ji
KELM
AAML
311
9
0
25 Feb 2025
OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering
International Conference on Human Factors in Computing Systems (CHI), 2024
Jiahao Nick Li
Zhuohao Jerry Zhang
Zhang
375
5
0
24 Feb 2025
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Mohammad Mahdi Abootorabi
Amirhosein Zobeiri
Mahdi Dehghani
Mohammadali Mohammadkhani
Bardia Mohammadi
Omid Ghahroodi
M. Baghshah
Ehsaneddin Asgari
RALM
639
27
0
12 Feb 2025
RAMQA: A Unified Framework for Retrieval-Augmented Multi-Modal Question Answering
North American Chapter of the Association for Computational Linguistics (NAACL), 2025
Yang Bai
Christan Earl Grant
Daisy Zhe Wang
RALM
197
2
0
23 Jan 2025
Multimodal Multihop Source Retrieval for Web Question Answering
Navya Yarrabelly
Saloni Mittal
100
0
0
07 Jan 2025
FCMR: Robust Evaluation of Financial Cross-Modal Multi-Hop Reasoning
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Seunghee Kim
Changhyeon Kim
Taeuk Kim
LRM
352
4
0
17 Dec 2024
Self-adaptive Multimodal Retrieval-Augmented Generation
Wenjia Zhai
VLM
163
2
0
15 Oct 2024
MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models
International Conference on Learning Representations (ICLR), 2024
Wenbo Hu
Jia-Chen Gu
Zi-Yi Dou
Mohsen Fayyaz
Pan Lu
Kai-Wei Chang
Nanyun Peng
VLM
252
23
0
10 Oct 2024
MuRAR: A Simple and Effective Multimodal Retrieval and Answer Refinement Framework for Multimodal Question Answering
International Conference on Computational Linguistics (COLING), 2024
Zhengyuan Zhu
Daniel Lee
Hong Zhang
Sai Sree Harsha
Loic Feujio
Akash Maharaj
Yunyao Li
122
6
0
16 Aug 2024
FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research
Jiajie Jin
Yutao Zhu
Xinyu Yang
Chenghao Zhang
Zhicheng Dou
Chenghao Zhang
Tong Zhao
Zhao Yang
Zhicheng Dou
Ji-Rong Wen
VLM
295
126
0
22 May 2024
RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning
Congyun Jin
Ming Zhang
Xiaowei Ma
Yujiao Li
Yingbo Wang
...
Chenfei Chi
Xiangguo Lv
Fangzhou Li
Wei Xue
Yiran Huang
LM&MA
142
10
0
19 Feb 2024
Evaluating LLMs' Mathematical Reasoning in Financial Document Question Answering
Pragya Srivastava
Manuj Malik
Vivek Gupta
T. Ganu
Dan Roth
194
36
0
17 Feb 2024
Text-to-Image Cross-Modal Generation: A Systematic Review
Maciej Żelaszczyk
Jacek Mańdziuk
285
6
0
21 Jan 2024
Through the Lens of Core Competency: Survey on Evaluation of Large Language Models
China National Conference on Chinese Computational Linguistics (CNCCL), 2023
Ziyu Zhuang
Qiguang Chen
Longxuan Ma
Mingda Li
Yi Han
Yushan Qian
Haopeng Bai
Zixian Feng
Weinan Zhang
Ting Liu
ELM
129
22
0
15 Aug 2023
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions
International Conference on Learning Representations (ICLR), 2023
Juncheng Li
Kaihang Pan
Zhiqi Ge
Minghe Gao
Wei Ji
Wenqiao Zhang
Tat-Seng Chua
Siliang Tang
Hanwang Zhang
Yueting Zhuang
MLLM
188
86
0
08 Aug 2023
DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI
Findings (Findings), 2023
Jianguo Zhang
Kun Qian
Zhiwei Liu
Shelby Heinecke
Rui Meng
Ye Liu
Zhou Yu
Huan Wang
Silvio Savarese
Caiming Xiong
259
29
0
19 Jul 2023
Read, Look or Listen? What's Needed for Solving a Multimodal Dataset
Netta Madvil
Yonatan Bitton
Roy Schwartz
160
3
0
06 Jul 2023
Pre-Training Multi-Modal Dense Retrievers for Outside-Knowledge Visual Question Answering
International Conference on the Theory of Information Retrieval (ICTIR), 2023
Alireza Salemi
Mahta Rafiee
Hamed Zamani
133
13
0
28 Jun 2023
Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study
Web Search and Data Mining (WSDM), 2023
Yuan Sui
Mengyu Zhou
Mingjie Zhou
Shi Han
Dongmei Zhang
LMTD
286
151
0
22 May 2023
MPMQA: Multimodal Question Answering on Product Manuals
AAAI Conference on Artificial Intelligence (AAAI), 2023
Liangfu Zhang
Anwen Hu
Jing Zhang
Shuo Hu
Qin Jin
158
14
0
19 Apr 2023
VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning
Computer Vision and Pattern Recognition (CVPR), 2023
Kan Chen
Xiangqian Wu
CoGe
139
19
0
05 Mar 2023
Complex QA and language models hybrid architectures, Survey
Xavier Daull
P. Bellot
Emmanuel Bruno
Vincent Martin
Elisabeth Murisasco
ELM
575
17
0
17 Feb 2023
SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images
AAAI Conference on Artificial Intelligence (AAAI), 2023
Ryota Tanaka
Kyosuke Nishida
Kosuke Nishida
Taku Hasegawa
Itsumi Saito
Kuniko Saito
179
140
0
12 Jan 2023
A Survey on Table-and-Text HybridQA: Concepts, Methods, Challenges and Future Directions
Dingzirui Wang
Longxu Dou
Wanxiang Che
251
7
0
27 Dec 2022
Enhancing Multi-modal and Multi-hop Question Answering via Structured Knowledge and Unified Retrieval-Generation
ACM Multimedia (ACM MM), 2022
Qian Yang
Qian Chen
Wen Wang
Baotian Hu
Min Zhang
231
32
0
16 Dec 2022
Training Vision-Language Models with Less Bimodal Supervision
Conference on Automated Knowledge Base Construction (AKBC), 2022
Elad Segal
Ben Bogin
Jonathan Berant
VLM
108
2
0
01 Nov 2022
PACIFIC: Towards Proactive Conversational Question Answering over Tabular and Textual Data in Finance
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yang Deng
Wenqiang Lei
Wenxuan Zhang
W. Lam
Tat-Seng Chua
277
66
0
17 Oct 2022
Large Language Models are few(1)-shot Table Reasoners
Findings (Findings), 2022
Wenhu Chen
LMTD
ReLM
LRM
203
188
0
13 Oct 2022
OpenCQA: Open-ended Question Answering with Charts
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Shankar Kantharaj
Do Xuan Long
Rixie Tiffany Ko Leong
J. Tan
Enamul Hoque
Shafiq Joty
144
67
0
12 Oct 2022
1
2
Next