Document AI: Benchmarks, Models and Applications

16 November 2021

Papers citing "Document AI: Benchmarks, Models and Applications"

45 / 45 papers shown

FlexDoc: Parameterized Sampling for Diverse Multilingual Synthetic Documents for Training Document Understanding Models

Karan Dua

Hitesh Laxmichand Patel

129

02 Oct 2025

OTCR: Optimal Transmission, Compression and Representation for Multimodal Information Extraction

17 Sep 2025

LED Benchmark: Diagnosing Structural Layout Errors for Document Layout Analysis

119

31 Jul 2025

MM-R5: MultiModal Reasoning-Enhanced ReRanker via Reinforcement Learning for Document Retrieval

271

14 Jun 2025

Doc-CoB: Enhancing Multi-Modal Document Understanding with Visual Chain-of-Boxes Reasoning

...

284

24 May 2025

Visual Text Processing: A Comprehensive Review and Unified Evaluation

...

439

30 Apr 2025

A Simple yet Effective Layout Token in Large Language Models for Document UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025

312

24 Mar 2025

Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs

Gaye Colakoglu

Gürkan Solmaz

Jonathan Fürst

312

25 Feb 2025

See then Tell: Enhancing Key Information Extraction with Vision Grounding

244

29 Sep 2024

DocMamba: Efficient Document Pre-training with State Space ModelAAAI Conference on Artificial Intelligence (AAAI), 2024

Pengfei Hu

Jiefeng Ma

296

18 Sep 2024

Deep Learning based Visually Rich Document Content Understanding: A Survey

458

02 Aug 2024

OfficeBench: Benchmarking Language Agents across Multiple Applications for Office Automation

266

26 Jul 2024

ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data

421

17 Jul 2024

DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming

372

27 Jun 2024

DistilDoc: Knowledge Distillation for Visually-Rich Document Applications

Matthew Blaschko

351

12 Jun 2024

XFormParser: A Simple and Effective Multimodal Multilingual Semi-structured Form Parser

Xiang Li

...

Zhoujun Li

228

27 May 2024

A Hybrid Approach for Document Layout Analysis in Document images

Tahira Shehzadi

Didier Stricker

Muhammad Zeshan Afzal

220

27 Apr 2024

HRVDA: High-Resolution Visual Document AssistantComputer Vision and Pattern Recognition (CVPR), 2024

Xin Li

274

10 Apr 2024

LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding

370

08 Apr 2024

Can AI Models Appreciate Document Aesthetics? An Exploration of Legibility and Layout Quality in Relation to Prediction Confidence

261

27 Mar 2024

A Survey of Table Reasoning with Large Language Models

309

13 Feb 2024

DocLLM: A layout-aware generative language model for multimodal document understandingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

276

106

31 Dec 2023

A Multi-Modal Multilingual Benchmark for Document Image ClassificationConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

240

25 Oct 2023

DocXChain: A Powerful Open-Source Toolchain for Document Parsing and Beyond

Cong Yao

213

19 Oct 2023

GridFormer: Towards Accurate Table Structure Recognition via Grid PredictionACM Multimedia (ACM MM), 2023

Jingdong Wang

286

26 Sep 2023

Comprehensive Overview of Named Entity Recognition: Models, Domain-Specific Applications and Challenges

Kalyani Pakhale

252

25 Sep 2023

Kosmos-2.5: A Multimodal Literate Model

...

260

20 Sep 2023

Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region ConcentrationIEEE International Conference on Computer Vision (ICCV), 2023

200

03 Sep 2023

Document AI: A Comparative Study of Transformer-Based, Graph-Based Models, and Convolutional Neural Networks For Document Layout Analysis

Sotirios Kastanas

Shaomu Tan

Yijiang He

134

29 Aug 2023

Vision Grid Transformer for Document Layout AnalysisIEEE International Conference on Computer Vision (ICCV), 2023

242

29 Aug 2023

A Graphical Approach to Document Layout AnalysisIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023

166

03 Aug 2023

DocTr: Document Transformer for Structured Information Extraction in DocumentsIEEE International Conference on Computer Vision (ICCV), 2023

196

16 Jul 2023

Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich DocumentConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Xiaozhong Liu

214

23 May 2023

Doc2SoarGraph: Discrete Reasoning over Visually-Rich Table-Text Documents via Semantic-Oriented Hierarchical GraphsInternational Conference on Language Resources and Evaluation (LREC), 2023

217

03 May 2023

Structure Diagram Recognition in Financial AnnouncementsIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023

Jun Wang

178

26 Apr 2023

GeoLayoutLM: Geometric Pre-training for Visual Information ExtractionComputer Vision and Pattern Recognition (CVPR), 2023

256

21 Apr 2023

HRDoc: Dataset and Baseline Method Toward Hierarchical Reconstruction of Document StructuresAAAI Conference on Artificial Intelligence (AAAI), 2023

Jun Du

200

24 Mar 2023

Wukong-Reader: Multi-modal Pre-training for Fine-grained Visual Document UnderstandingAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

...

Xin Jiang

Qun Liu

ViT

218

19 Dec 2022

XDoc: Unified Pre-training for Cross-Format Document UnderstandingConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

265

06 Oct 2022

Detect, Retrieve, Comprehend: A Flexible Framework for Zero-Shot Document-Level Question Answering

289

04 Oct 2022

ERNIE-mmLayout: Multi-grained MultiModal Transformer for Document Understanding

...

Dianhai Yu

165

18 Sep 2022

Towards Complex Document Understanding By Discrete ReasoningACM Multimedia (ACM MM), 2022

314

25 Jul 2022

LayoutLMv3: Pre-training for Document AI with Unified Text and Image MaskingACM Multimedia (ACM MM), 2022

663

630

18 Apr 2022

Do BERTs Learn to Use Browser User Interface? Exploring Multi-Step Tasks with Unified Vision-and-Language BERTs

Taichi Iki

Akiko Aizawa

LLMAG

189

15 Mar 2022

DiT: Self-supervised Pre-training for Document Image TransformerACM Multimedia (ACM MM), 2022

357

207

04 Mar 2022