v1v2 (latest)

Unified Pretraining Framework for Document Understanding

Neural Information Processing Systems (NeurIPS), 2022

22 April 2022

Jiuxiang Gu

ArXiv (abs)PDF HTML Github (29323★)

Papers citing "Unified Pretraining Framework for Document Understanding"

50 / 78 papers shown

ColMate: Contrastive Late Interaction and Masked Text for Multimodal Document Retrieval

Ahmed Masry

Megh Thakkar

Patrice Bechard

Sathwik Tejaswi Madhusudhan

...

234

02 Nov 2025

Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding

425

17 Oct 2025

SynDoc: A Hybrid Discriminative-Generative Framework for Enhancing Synthetic Domain-Adaptive Document Key Information Extraction

150

27 Sep 2025

DocPolarBERT: A Pre-trained Model for Document Understanding with Relative Polar Coordinate Encoding of Layout Structures

387

11 Jul 2025

SimpleDoc: Multi-Modal Document Understanding with Dual-Cue Page Retrieval and Iterative Refinement

238

16 Jun 2025

OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models

376

22 Feb 2025

Handwritten Text Recognition: A Survey

Carlos Garrido-Munoz

Antonio Ríos-Vila

Jorge Calvo-Zaragoza

382

12 Feb 2025

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive AnnotationsComputer Vision and Pattern Recognition (CVPR), 2024

...

571

10 Dec 2024

ReLayout: Towards Real-World Document Understanding via Layout-enhanced Pre-trainingInternational Conference on Computational Linguistics (COLING), 2024

285

14 Oct 2024

Towards an Improved Metric for Evaluating Disentangled Representations

Sahib Julka

Yashu Wang

Michael Granitzer

228

04 Oct 2024

SynJAC: Synthetic-data-driven Joint-granular Adaptation and Calibration for Domain Specific Scanned Document Key Information Extraction

323

02 Oct 2024

DocMamba: Efficient Document Pre-training with State Space ModelAAAI Conference on Artificial Intelligence (AAAI), 2024

Pengfei Hu

Jiefeng Ma

367

18 Sep 2024

Deep Learning based Visually Rich Document Content Understanding: A Survey

567

02 Aug 2024

SciPostLayout: A Dataset for Layout Analysis and Layout Generation of Scientific PostersBritish Machine Vision Conference (BMVC), 2024

Shohei Tanaka

Hao Wang

Yoshitaka Ushiku

210

29 Jul 2024

ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data

521

17 Jul 2024

DistilDoc: Knowledge Distillation for Visually-Rich Document Applications

Matthew Blaschko

474

12 Jun 2024

UnSupDLA: Towards Unsupervised Document Layout Analysis

Muhammad Zeshan Afzal

264

10 Jun 2024

Multimodal Adaptive Inference for Document Image Classification with Anytime Early Exiting

Matthew Blaschko

283

21 May 2024

DLAFormer: An End-to-End Transformer For Document Layout Analysis

298

20 May 2024

GeoContrastNet: Contrastive Key-Value Edge Learning for Language-Agnostic Document UnderstandingIEEE International Conference on Document Analysis and Recognition (ICDAR), 2024

Nil Biescas

Carlos Boned Riera

Josep Lladós

Sanket Biswas

281

06 May 2024

Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism

334

29 Apr 2024

A Hybrid Approach for Document Layout Analysis in Document images

Tahira Shehzadi

Didier Stricker

Muhammad Zeshan Afzal

282

27 Apr 2024

PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based Visual Question Answering

251

19 Apr 2024

LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding

442

123

08 Apr 2024

Noise-Aware Training of Layout-Aware Language Models

Emmanouil Koukoumidis

Arnab Nandi

VLM

282

30 Mar 2024

DOCMASTER: A Unified Platform for Annotation, Training, & Inference in Document Question-Answering

280

30 Mar 2024

OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition

Yuliang Liu

Fei Huang

331

28 Mar 2024

Visually Guided Generative Text-Layout Pre-training for Document Intelligence

Xin Jiang

Qun Liu

Kam-Fai Wong

308

25 Mar 2024

LayoutLLM: Large Language Model Instruction Tuning for Visually Rich Document Understanding

Masato Fujitake

MLLM

243

21 Mar 2024

Transformers and Language Models in Form Understanding: A Comprehensive Review of Scanned Document Analysis

283

06 Mar 2024

TreeForm: End-to-end Annotation and Evaluation for Form Document Parsing

297

07 Feb 2024

Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure AnalysisPattern Recognition (Pattern Recogn.), 2024

353

22 Jan 2024

PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction

Lianwen Jin

240

07 Jan 2024

FATURA: A Multi-Layout Invoice Image Dataset for Document Analysis and Understanding

Mahmoud Limam

M. Dhiaf

Yousri Kessentini

241

20 Nov 2023

On Task-personalized Multimodal Few-shot Learning for Visually-rich Document Entity RetrievalConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

342

01 Nov 2023

Enhancing Document Information Analysis with Multi-Task Pre-training: A Robust Approach for Information Extraction in Visually-Rich DocumentsIEEE International Joint Conference on Neural Network (IJCNN), 2023

Tofik Ali

Partha Pratim Roy

261

25 Oct 2023

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency LearningConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

274

23 Oct 2023

DSG: An End-to-End Document Structure Generator

305

13 Oct 2023

Document Understanding for Healthcare ReferralsIEEE International Conference on Healthcare Informatics (ICHI), 2023

Jimit Mistry

N. Arzeno

MedIm

132

22 Sep 2023

SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain GapIEEE International Conference on Computer Vision (ICCV), 2023

431

21 Sep 2023

Vision Grid Transformer for Document Layout AnalysisIEEE International Conference on Computer Vision (ICCV), 2023

291

29 Aug 2023

Beyond Document Page Classification: Design, Datasets, and ChallengesIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

Jordy Van Landeghem

Sanket Biswas

Matthew B. Blaschko

Marie-Francine Moens

270

24 Aug 2023

A Graphical Approach to Document Layout AnalysisIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023

314

03 Aug 2023

RealCQA: Scientific Chart Question Answering as a Test-bed for First-Order LogicIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023

Venugopal Govindaraju

191

03 Aug 2023

SpaDen : Sparse and Dense Keypoint Estimation for Real-World Chart UnderstandingIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023

Venugopal Govindaraju

175

03 Aug 2023

Bridging the Performance Gap between DETR and R-CNN for Graphical Object Detection in Document Images

Muhammad Zeshan Afzal

312

23 Jun 2023

On Evaluation of Document Classification using RVL-CDIP

Stefan Larson

Gordon Lim

Kevin Leach

348

21 Jun 2023

DocumentNet: Bridging the Data Gap in Document Pre-TrainingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Alexander G. Hauptmann

H. Dai

Wei Wei

183

15 Jun 2023

DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents

Fuxiao Liu

Hao Tan

Chris Tensmeyer

CLIP VLM

333

09 Jun 2023

Do-GOOD: Towards Distribution Shift Evaluation for Pre-Trained Visual Document Understanding ModelsAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2023

199

05 Jun 2023