SlideVQA: A Dataset for Document Visual Question Answering on Multiple
Images

SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images

12 January 2023

Kyosuke Nishida

Papers citing "SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images"

17 / 17 papers shown

Title
DocVideoQA: Towards Comprehensive Understanding of Document-Centric Videos through Question Answering H. Wang Kai Hu Liangcai Gao 144 0 0 20 Mar 2025
LiGT: Layout-infused Generative Transformer for Visual Question Answering on Vietnamese Receipts Thanh-Phong Le Trung Le Chi Phan Nghia Hieu Nguyen Kiet Van Nguyen ViT 44 0 0 26 Feb 2025
Quantifying Memorization and Retriever Performance in Retrieval-Augmented Vision-Language Models Peter Carragher Abhinand Jha R Raghav Kathleen M. Carley RALM 75 0 0 20 Feb 2025
Any Information Is Just Worth One Single Screenshot: Unifying Search With Visualized Information Retrieval Ze Liu Zhengyang Liang Junjie Zhou Zheng Liu Defu Lian OffRL 97 0 0 17 Feb 2025
REAL-MM-RAG: A Real-World Multi-Modal Retrieval Benchmark Navve Wasserman Roi Pony O. Naparstek Adi Raz Goldfarb Eli Schwartz Udi Barzelay Leonid Karlinsky 3DV VLM 82 1 0 17 Feb 2025
PixelWorld: Towards Perceiving Everything as Pixels Zhiheng Lyu Xueguang Ma Wenhu Chen 143 0 0 31 Jan 2025
Baichuan-Omni-1.5 Technical Report Yadong Li J. Liu Tao Zhang Tao Zhang S. Chen ... Jianhua Xu Haoze Sun Mingan Lin Zenan Zhou Weipeng Chen AuLLM 72 10 0 28 Jan 2025
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation Manan Suri Puneet Mathur Franck Dernoncourt Kanika Goswami Ryan Rossi Dinesh Manocha 95 3 0 14 Dec 2024
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents S. Yu C. Tang Bokai Xu Junbo Cui Junhao Ran ... Zhenghao Liu Shuo Wang Xu Han Zhiyuan Liu Maosong Sun VLM 39 22 0 14 Oct 2024
JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images Zhecan Wang Junzhang Liu Chia-Wei Tang Hani Alomari Anushka Sivakumar ... Haoxuan You A. Ishmam Kai-Wei Chang Shih-Fu Chang Chris Thomas CoGe VLM 61 2 0 19 Sep 2024
SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers Shraman Pramanick Rama Chellappa Subhashini Venugopalan 48 13 0 12 Jul 2024
PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based Visual Question Answering Yihao Ding Kaixuan Ren Jiabin Huang Siwen Luo S. Han 40 1 0 19 Apr 2024
RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning Congyun Jin Ming Zhang Xiaowei Ma Yujiao Li Yingbo Wang ... Chenfei Chi Xiangguo Lv Fangzhou Li Wei Xue Yiran Huang LM&MA 27 2 0 19 Feb 2024
BloomVQA: Assessing Hierarchical Multi-modal Comprehension Yunye Gong Robik Shrestha Jared Claypoole Michael Cogswell Arijit Ray Christopher Kanan Ajay Divakaran 28 0 0 20 Dec 2023
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Jason W. Wei Xuezhi Wang Dale Schuurmans Maarten Bosma Brian Ichter F. Xia Ed H. Chi Quoc Le Denny Zhou LM&Ro LRM AI4CE ReLM 355 8,457 0 28 Jan 2022
Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning Skills Ori Yoran Alon Talmor Jonathan Berant ReLM LRM 177 53 0 15 Jul 2021
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding Yang Xu Yiheng Xu Tengchao Lv Lei Cui Furu Wei ... D. Florêncio Cha Zhang Wanxiang Che Min Zhang Lidong Zhou ViT MLLM 147 498 0 29 Dec 2020