PDFVQA: A New Dataset for Real-World VQA on PDF Documents

PDFVQA: A New Dataset for Real-World VQA on PDF Documents

13 April 2023

Papers citing "PDFVQA: A New Dataset for Real-World VQA on PDF Documents"

13 / 13 papers shown

Title
LiGT: Layout-infused Generative Transformer for Visual Question Answering on Vietnamese Receipts Thanh-Phong Le Trung Le Chi Phan Nghia Hieu Nguyen Kiet Van Nguyen ViT 44 0 0 26 Feb 2025
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding Jaemin Cho Debanjan Mahata Ozan Irsoy Yujie He Mohit Bansal VLM 25 8 0 07 Nov 2024
NVLM: Open Frontier-Class Multimodal LLMs Wenliang Dai Nayeon Lee Boxin Wang Zhuoling Yang Zihan Liu Jon Barker Tuomas Rintamaki M. Shoeybi Bryan Catanzaro Wei Ping MLLM VLM LRM 40 51 0 17 Sep 2024
The MERIT Dataset: Modelling and Efficiently Rendering Interpretable Transcripts I. de Rodrigo A. Sanchez-Cuadrado J. Boal A. J. Lopez-Lopez VLM 21 1 0 31 Aug 2024
UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis Yulong Hui Yao Lu Huanchen Zhang RALM 38 9 0 21 Jun 2024
StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond Pengyuan Lyu Yulin Li Hao Zhou Weihong Ma Xingyu Wan ... Liang Wu Chengquan Zhang Kun Yao Errui Ding Jingdong Wang 36 7 0 31 May 2024
PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based Visual Question Answering Yihao Ding Kaixuan Ren Jiabin Huang Siwen Luo S. Han 35 1 0 19 Apr 2024
From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities Md Farhan Ishmam Md Sakib Hossain Shovon M. F. Mridha Nilanjan Dey 35 36 0 01 Nov 2023
Enhancing BERT-Based Visual Question Answering through Keyword-Driven Sentence Selection Davide Napolitano Lorenzo Vaiani Luca Cagliero 19 1 0 13 Oct 2023
Jaeger: A Concatenation-Based Multi-Transformer VQA Model Jieting Long Zewei Shi Penghao Jiang Yidong Gan 22 0 0 11 Oct 2023
MemSum-DQA: Adapting An Efficient Long Document Extractive Summarizer for Document Question Answering Nianlong Gu Yingqiang Gao Richard H. R. Hahnloser RALM 41 0 0 10 Oct 2023
Workshop on Document Intelligence Understanding S. Han Yihao Ding Siwen Luo J. Poon HeeGuen Yoon Zhe Huang P. Duuring E. Holden 14 0 0 31 Jul 2023
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding Yang Xu Yiheng Xu Tengchao Lv Lei Cui Furu Wei ... D. Florêncio Cha Zhang Wanxiang Che Min Zhang Lidong Zhou ViT MLLM 145 498 0 29 Dec 2020