Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2205.13724
Cited By
V-Doc : Visual questions answers with Documents
27 May 2022
Yihao Ding
Zhe Huang
Runlin Wang
Yanhang Zhang
Xianru Chen
Yuzhong Ma
Hyunsuk Chung
S. Han
Re-assign community
ArXiv
PDF
HTML
Papers citing
"V-Doc : Visual questions answers with Documents"
13 / 13 papers shown
Title
Joint Extraction Matters: Prompt-Based Visual Question Answering for Multi-Field Document Information Extraction
Mengsay Loem
Taiju Hosaka
32
0
0
21 Mar 2025
MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding
S. Han
Peng Xia
Ruiyi Zhang
Tong Sun
Yun-Qing Li
Hongtu Zhu
Huaxiu Yao
VLM
87
3
0
18 Mar 2025
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation
Manan Suri
Puneet Mathur
Franck Dernoncourt
Kanika Goswami
Ryan Rossi
Dinesh Manocha
95
3
0
14 Dec 2024
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications
Jordy Van Landeghem
Subhajit Maity
Ayan Banerjee
Matthew Blaschko
Marie-Francine Moens
Josep Lladós
Sanket Biswas
41
2
0
12 Jun 2024
PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based Visual Question Answering
Yihao Ding
Kaixuan Ren
Jiabin Huang
Siwen Luo
S. Han
35
1
0
19 Apr 2024
HRVDA: High-Resolution Visual Document Assistant
Chaohu Liu
Kun Yin
Haoyu Cao
Xinghua Jiang
Xin Li
Yinsong Liu
Deqiang Jiang
Xing Sun
Linli Xu
VLM
35
23
0
10 Apr 2024
Workshop on Document Intelligence Understanding
S. Han
Yihao Ding
Siwen Luo
J. Poon
HeeGuen Yoon
Zhe Huang
P. Duuring
E. Holden
14
0
0
31 Jul 2023
PDFVQA: A New Dataset for Real-World VQA on PDF Documents
Yihao Ding
Siwen Luo
Hyunsuk Chung
S. Han
22
17
0
13 Apr 2023
PiggyBack: Pretrained Visual Question Answering Environment for Backing up Non-deep Learning Professionals
Zhihao Zhang
Siwen Luo
Junyi Chen
Sijia Lai
Siqu Long
Hyunsuk Chung
S. Han
12
1
0
29 Nov 2022
Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis
Siwen Luo
Yi Ding
Siqu Long
Josiah Poon
S. Han
GNN
10
16
0
22 Aug 2022
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
Yang Xu
Yiheng Xu
Tengchao Lv
Lei Cui
Furu Wei
...
D. Florêncio
Cha Zhang
Wanxiang Che
Min Zhang
Lidong Zhou
ViT
MLLM
145
498
0
29 Dec 2020
FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents
Guillaume Jaume
H. K. Ekenel
Jean-Philippe Thiran
122
355
0
27 May 2019
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
144
1,464
0
06 Jun 2016
1