ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.06447
  4. Cited By
PDFVQA: A New Dataset for Real-World VQA on PDF Documents

PDFVQA: A New Dataset for Real-World VQA on PDF Documents

13 April 2023
Yihao Ding
Siwen Luo
Hyunsuk Chung
S. Han
ArXivPDFHTML

Papers citing "PDFVQA: A New Dataset for Real-World VQA on PDF Documents"

13 / 13 papers shown
Title
LiGT: Layout-infused Generative Transformer for Visual Question Answering on Vietnamese Receipts
LiGT: Layout-infused Generative Transformer for Visual Question Answering on Vietnamese Receipts
Thanh-Phong Le
Trung Le Chi Phan
Nghia Hieu Nguyen
Kiet Van Nguyen
ViT
44
0
0
26 Feb 2025
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page
  Multi-document Understanding
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding
Jaemin Cho
Debanjan Mahata
Ozan Irsoy
Yujie He
Mohit Bansal
VLM
25
8
0
07 Nov 2024
NVLM: Open Frontier-Class Multimodal LLMs
NVLM: Open Frontier-Class Multimodal LLMs
Wenliang Dai
Nayeon Lee
Boxin Wang
Zhuoling Yang
Zihan Liu
Jon Barker
Tuomas Rintamaki
M. Shoeybi
Bryan Catanzaro
Wei Ping
MLLM
VLM
LRM
40
51
0
17 Sep 2024
The MERIT Dataset: Modelling and Efficiently Rendering Interpretable
  Transcripts
The MERIT Dataset: Modelling and Efficiently Rendering Interpretable Transcripts
I. de Rodrigo
A. Sanchez-Cuadrado
J. Boal
A. J. Lopez-Lopez
VLM
21
1
0
31 Aug 2024
UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world
  Document Analysis
UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis
Yulong Hui
Yao Lu
Huanchen Zhang
RALM
38
9
0
21 Jun 2024
StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image
  Perception, Comprehension, and Beyond
StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond
Pengyuan Lyu
Yulin Li
Hao Zhou
Weihong Ma
Xingyu Wan
...
Liang Wu
Chengquan Zhang
Kun Yao
Errui Ding
Jingdong Wang
36
7
0
31 May 2024
PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based
  Visual Question Answering
PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based Visual Question Answering
Yihao Ding
Kaixuan Ren
Jiabin Huang
Siwen Luo
S. Han
35
1
0
19 Apr 2024
From Image to Language: A Critical Analysis of Visual Question Answering
  (VQA) Approaches, Challenges, and Opportunities
From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities
Md Farhan Ishmam
Md Sakib Hossain Shovon
M. F. Mridha
Nilanjan Dey
35
36
0
01 Nov 2023
Enhancing BERT-Based Visual Question Answering through Keyword-Driven
  Sentence Selection
Enhancing BERT-Based Visual Question Answering through Keyword-Driven Sentence Selection
Davide Napolitano
Lorenzo Vaiani
Luca Cagliero
19
1
0
13 Oct 2023
Jaeger: A Concatenation-Based Multi-Transformer VQA Model
Jaeger: A Concatenation-Based Multi-Transformer VQA Model
Jieting Long
Zewei Shi
Penghao Jiang
Yidong Gan
22
0
0
11 Oct 2023
MemSum-DQA: Adapting An Efficient Long Document Extractive Summarizer
  for Document Question Answering
MemSum-DQA: Adapting An Efficient Long Document Extractive Summarizer for Document Question Answering
Nianlong Gu
Yingqiang Gao
Richard H. R. Hahnloser
RALM
41
0
0
10 Oct 2023
Workshop on Document Intelligence Understanding
Workshop on Document Intelligence Understanding
S. Han
Yihao Ding
Siwen Luo
J. Poon
HeeGuen Yoon
Zhe Huang
P. Duuring
E. Holden
14
0
0
31 Jul 2023
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document
  Understanding
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
Yang Xu
Yiheng Xu
Tengchao Lv
Lei Cui
Furu Wei
...
D. Florêncio
Cha Zhang
Wanxiang Che
Min Zhang
Lidong Zhou
ViT
MLLM
145
498
0
29 Dec 2020
1