v1v2 (latest)

Multimodal Pre-training Based on Graph Attention Network for Document Understanding

IEEE transactions on multimedia (IEEE TMM), 2022

25 March 2022

Jun Du

ArXiv (abs)PDF HTML Github (43★)

Papers citing "Multimodal Pre-training Based on Graph Attention Network for Document Understanding"

19 / 19 papers shown

Cascaded Robust Rectification for Arbitrary Document Images

137

28 Nov 2025

OTCR: Optimal Transmission, Compression and Representation for Multimodal Information Extraction

17 Sep 2025

Document Image Rectification Bases on Self-Adaptive Multitask Fusion

Heng Li

Xiangping Wu

Qingcai Chen

358

09 May 2025

Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs

Gaye Colakoglu

Gürkan Solmaz

Jonathan Fürst

314

25 Feb 2025

Modeling Layout Reading Order as Ordering Relations for Visually-rich Document UnderstandingConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Huan Chen

...

Qi Zhang

200

29 Sep 2024

See then Tell: Enhancing Key Information Extraction with Vision Grounding

244

29 Sep 2024

DocMamba: Efficient Document Pre-training with State Space ModelAAAI Conference on Artificial Intelligence (AAAI), 2024

Pengfei Hu

Jiefeng Ma

299

18 Sep 2024

Deep Learning based Visually Rich Document Content Understanding: A Survey

460

02 Aug 2024

SRFUND: A Multi-Granularity Hierarchical Structure Reconstruction Benchmark in Form Understanding

Jiefeng Ma

Pengfei Hu

188

13 Jun 2024

BuDDIE: A Business Document Dataset for Multi-task Information Extraction

...

217

05 Apr 2024

DocLLM: A layout-aware generative language model for multimodal document understandingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

278

112

31 Dec 2023

Document Understanding for Healthcare ReferralsIEEE International Conference on Healthcare Informatics (ICHI), 2023

Jimit Mistry

N. Arzeno

MedIm

103

22 Sep 2023

LMDX: Language Model-based Document Information Extraction and LocalizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

...

Chen-Yu Lee

232

19 Sep 2023

ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich Document ImagesIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023

...

Jingdong Wang

201

05 Jun 2023

RE$^2$: Region-Aware Relation Extraction from Visually Rich Documents

^2

: Region-Aware Relation Extraction from Visually Rich DocumentsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023

208

24 May 2023

Deep Unrestricted Document Image RectificationIEEE transactions on multimedia (IEEE TMM), 2023

Hao Feng

298

18 Apr 2023

PDFVQA: A New Dataset for Real-World VQA on PDF Documents

404

13 Apr 2023

HRDoc: Dataset and Baseline Method Toward Hierarchical Reconstruction of Document StructuresAAAI Conference on Artificial Intelligence (AAAI), 2023

Jun Du

200

24 Mar 2023

DocILE Benchmark for Document Information Localization and ExtractionIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023

...

Antoine Doucet

199

11 Feb 2023