ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.10939
  4. Cited By
Unified Pretraining Framework for Document Understanding
v1v2 (latest)

Unified Pretraining Framework for Document Understanding

Neural Information Processing Systems (NeurIPS), 2022
22 April 2022
Jiuxiang Gu
Jason Kuen
Vlad I. Morariu
Handong Zhao
Nikolaos Barmpalios
R. Jain
A. Nenkova
Tong Sun
ArXiv (abs)PDFHTMLGithub (29323★)

Papers citing "Unified Pretraining Framework for Document Understanding"

50 / 78 papers shown
ColMate: Contrastive Late Interaction and Masked Text for Multimodal Document Retrieval
ColMate: Contrastive Late Interaction and Masked Text for Multimodal Document Retrieval
Ahmed Masry
Megh Thakkar
Patrice Bechard
Sathwik Tejaswi Madhusudhan
Rabiul Awal
...
Srivatsava Daruru
Enamul Hoque
Spandana Gella
Torsten Scholak
Sai Rajeswar
VLM
234
2
0
02 Nov 2025
Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding
Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding
Sensen Gao
Shanshan Zhao
Xu Jiang
Lunhao Duan
Yong Xien Chng
Qing-Guo Chen
Weihua Luo
Kaifu Zhang
Jia-Wang Bian
Mingming Gong
425
4
0
17 Oct 2025
SynDoc: A Hybrid Discriminative-Generative Framework for Enhancing Synthetic Domain-Adaptive Document Key Information Extraction
SynDoc: A Hybrid Discriminative-Generative Framework for Enhancing Synthetic Domain-Adaptive Document Key Information Extraction
Yihao Ding
Soyeon Caren Han
Yanbei Jiang
Yan Li
Zechuan Li
Yifan Peng
SyDa
150
0
0
27 Sep 2025
DocPolarBERT: A Pre-trained Model for Document Understanding with Relative Polar Coordinate Encoding of Layout Structures
DocPolarBERT: A Pre-trained Model for Document Understanding with Relative Polar Coordinate Encoding of Layout Structures
Benno Uthayasooriyar
Antoine Ly
Franck Vermet
Caio Corro
387
0
0
11 Jul 2025
SimpleDoc: Multi-Modal Document Understanding with Dual-Cue Page Retrieval and Iterative Refinement
SimpleDoc: Multi-Modal Document Understanding with Dual-Cue Page Retrieval and Iterative Refinement
Chelsi Jain
Yiran Wu
Yifan Zeng
Jiale Liu
S hengyu Dai
Zhenwen Shao
Qingyun Wu
Huazheng Wang
238
11
0
16 Jun 2025
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models
Wenwen Yu
Zhibo Yang
Jianqiang Wan
Sibo Song
J. Tang
Wenqing Cheng
Yunxing Liu
Xiang Bai
376
19
0
22 Feb 2025
Handwritten Text Recognition: A Survey
Handwritten Text Recognition: A Survey
Carlos Garrido-Munoz
Antonio Ríos-Vila
Jorge Calvo-Zaragoza
382
13
0
12 Feb 2025
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive AnnotationsComputer Vision and Pattern Recognition (CVPR), 2024
Linke Ouyang
Yuan Qu
Hongbin Zhou
Jiawei Zhu
Rui Zhang
...
Chao Xu
Bo Zhang
Ding Wang
Zhongying Tu
Bin Wang
571
72
0
10 Dec 2024
ReLayout: Towards Real-World Document Understanding via Layout-enhanced
  Pre-training
ReLayout: Towards Real-World Document Understanding via Layout-enhanced Pre-trainingInternational Conference on Computational Linguistics (COLING), 2024
Zhouqiang Jiang
Bowen Wang
Junhao Chen
Yuta Nakashima
285
5
0
14 Oct 2024
Towards an Improved Metric for Evaluating Disentangled Representations
Towards an Improved Metric for Evaluating Disentangled Representations
Sahib Julka
Yashu Wang
Michael Granitzer
228
6
0
04 Oct 2024
SynJAC: Synthetic-data-driven Joint-granular Adaptation and Calibration for Domain Specific Scanned Document Key Information Extraction
SynJAC: Synthetic-data-driven Joint-granular Adaptation and Calibration for Domain Specific Scanned Document Key Information Extraction
Yihao Ding
S. Han
Zechuan Li
Hyunsuk Chung
323
3
0
02 Oct 2024
DocMamba: Efficient Document Pre-training with State Space Model
DocMamba: Efficient Document Pre-training with State Space ModelAAAI Conference on Artificial Intelligence (AAAI), 2024
Pengfei Hu
Zhenrong Zhang
Jiefeng Ma
Shuhang Liu
Jun Du
Jianshu Zhang
Mamba
367
4
0
18 Sep 2024
Deep Learning based Visually Rich Document Content Understanding: A Survey
Deep Learning based Visually Rich Document Content Understanding: A Survey
Muhammad Ali
Jean Lee
Salman Khan
Eduard Hovy
567
21
0
02 Aug 2024
SciPostLayout: A Dataset for Layout Analysis and Layout Generation of
  Scientific Posters
SciPostLayout: A Dataset for Layout Analysis and Layout Generation of Scientific PostersBritish Machine Vision Conference (BMVC), 2024
Shohei Tanaka
Hao Wang
Yoshitaka Ushiku
210
13
0
29 Jul 2024
ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data
ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data
Yufan Shen
Chuwei Luo
Zhaoqing Zhu
Yang Chen
Qi Zheng
Zhi Yu
Jiajun Bu
Cong Yao
521
6
0
17 Jul 2024
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications
Jordy Van Landeghem
Subhajit Maity
Ayan Banerjee
Matthew Blaschko
Marie-Francine Moens
Josep Lladós
Sanket Biswas
474
6
0
12 Jun 2024
UnSupDLA: Towards Unsupervised Document Layout Analysis
UnSupDLA: Towards Unsupervised Document Layout Analysis
Talha Uddin Sheikh
Tahira Shehzadi
K. Hashmi
Didier Stricker
Muhammad Zeshan Afzal
264
4
0
10 Jun 2024
Multimodal Adaptive Inference for Document Image Classification with
  Anytime Early Exiting
Multimodal Adaptive Inference for Document Image Classification with Anytime Early Exiting
Omar Hamed
Souhail Bakkali
Marie-Francine Moens
Matthew Blaschko
Jordy Van Landeghem
283
2
0
21 May 2024
DLAFormer: An End-to-End Transformer For Document Layout Analysis
DLAFormer: An End-to-End Transformer For Document Layout Analysis
Jiawei Wang
Kai Hu
Qiang Huo
3DVViT
298
16
0
20 May 2024
GeoContrastNet: Contrastive Key-Value Edge Learning for
  Language-Agnostic Document Understanding
GeoContrastNet: Contrastive Key-Value Edge Learning for Language-Agnostic Document UnderstandingIEEE International Conference on Document Analysis and Recognition (ICDAR), 2024
Nil Biescas
Carlos Boned Riera
Josep Lladós
Sanket Biswas
281
4
0
06 May 2024
Multi-Page Document Visual Question Answering using Self-Attention
  Scoring Mechanism
Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism
Lei Kang
Rubèn Pérez Tito
Ernest Valveny
Dimosthenis Karatzas
334
11
0
29 Apr 2024
A Hybrid Approach for Document Layout Analysis in Document images
A Hybrid Approach for Document Layout Analysis in Document images
Tahira Shehzadi
Didier Stricker
Muhammad Zeshan Afzal
282
19
0
27 Apr 2024
PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based
  Visual Question Answering
PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based Visual Question Answering
Yihao Ding
Kaixuan Ren
Jiabin Huang
Siwen Luo
S. Han
251
5
0
19 Apr 2024
LayoutLLM: Layout Instruction Tuning with Large Language Models for
  Document Understanding
LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding
Chuwei Luo
Yufan Shen
Zhaoqing Zhu
Qi Zheng
Zhi Yu
Cong Yao
442
123
0
08 Apr 2024
Noise-Aware Training of Layout-Aware Language Models
Noise-Aware Training of Layout-Aware Language Models
Ritesh Sarkhel
Xiaoqi Ren
Lauro Beltrao Costa
Guolong Su
Vincent Perot
Yanan Xie
Emmanouil Koukoumidis
Arnab Nandi
VLM
282
0
0
30 Mar 2024
DOCMASTER: A Unified Platform for Annotation, Training, & Inference in
  Document Question-Answering
DOCMASTER: A Unified Platform for Annotation, Training, & Inference in Document Question-Answering
Alex Nguyen
Zilong Wang
Jingbo Shang
Dheeraj Mekala
280
1
0
30 Mar 2024
OmniParser: A Unified Framework for Text Spotting, Key Information
  Extraction and Table Recognition
OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition
Jianqiang Wan
Sibo Song
Wenwen Yu
Yuliang Liu
Wenqing Cheng
Fei Huang
Xiang Bai
Cong Yao
Zhibo Yang
331
87
0
28 Mar 2024
Visually Guided Generative Text-Layout Pre-training for Document
  Intelligence
Visually Guided Generative Text-Layout Pre-training for Document Intelligence
Zhiming Mao
Haoli Bai
Lu Hou
Jiansheng Wei
Xin Jiang
Qun Liu
Kam-Fai Wong
308
14
0
25 Mar 2024
LayoutLLM: Large Language Model Instruction Tuning for Visually Rich
  Document Understanding
LayoutLLM: Large Language Model Instruction Tuning for Visually Rich Document Understanding
Masato Fujitake
MLLM
243
21
0
21 Mar 2024
Transformers and Language Models in Form Understanding: A Comprehensive
  Review of Scanned Document Analysis
Transformers and Language Models in Form Understanding: A Comprehensive Review of Scanned Document Analysis
Abdelrahman Abdallah
Daniel Eberharter
Zoe Pfister
Adam Jatowt
283
17
0
06 Mar 2024
TreeForm: End-to-end Annotation and Evaluation for Form Document Parsing
TreeForm: End-to-end Annotation and Evaluation for Form Document Parsing
Ran Zmigrod
Zhiqiang Ma
Armineh Nourbakhsh
Sameena Shah
297
5
0
07 Feb 2024
Detect-Order-Construct: A Tree Construction based Approach for
  Hierarchical Document Structure Analysis
Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure AnalysisPattern Recognition (Pattern Recogn.), 2024
Jiawei Wang
Kai Hu
Zhuoyao Zhong
Lei-huan Sun
Qiang Huo
353
17
0
22 Jan 2024
PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for
  End-to-end Document Pair Extraction
PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction
Zening Lin
Jiapeng Wang
Teng Li
Wenhui Liao
Dayi Huang
Longfei Xiong
Lianwen Jin
240
4
0
07 Jan 2024
FATURA: A Multi-Layout Invoice Image Dataset for Document Analysis and
  Understanding
FATURA: A Multi-Layout Invoice Image Dataset for Document Analysis and Understanding
Mahmoud Limam
M. Dhiaf
Yousri Kessentini
241
6
0
20 Nov 2023
On Task-personalized Multimodal Few-shot Learning for Visually-rich
  Document Entity Retrieval
On Task-personalized Multimodal Few-shot Learning for Visually-rich Document Entity RetrievalConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Jiayi Chen
H. Dai
Bo Dai
Aidong Zhang
Wei Wei
342
3
0
01 Nov 2023
Enhancing Document Information Analysis with Multi-Task Pre-training: A
  Robust Approach for Information Extraction in Visually-Rich Documents
Enhancing Document Information Analysis with Multi-Task Pre-training: A Robust Approach for Information Extraction in Visually-Rich DocumentsIEEE International Joint Conference on Neural Network (IJCNN), 2023
Tofik Ali
Partha Pratim Roy
261
1
0
25 Oct 2023
Vision-Enhanced Semantic Entity Recognition in Document Images via
  Visually-Asymmetric Consistency Learning
Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency LearningConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Hao Wang
Xiahua Chen
Rui Wang
Chenhui Chu
274
2
0
23 Oct 2023
DSG: An End-to-End Document Structure Generator
DSG: An End-to-End Document Structure Generator
Johannes Rausch
Gentiana Rashiti
Maxim Gusev
Ce Zhang
Stefan Feuerriegel
305
5
0
13 Oct 2023
Document Understanding for Healthcare Referrals
Document Understanding for Healthcare ReferralsIEEE International Conference on Healthcare Informatics (ICHI), 2023
Jimit Mistry
N. Arzeno
MedIm
132
2
0
22 Sep 2023
SCOB: Universal Text Understanding via Character-wise Supervised
  Contrastive Learning with Online Text Rendering for Bridging Domain Gap
SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain GapIEEE International Conference on Computer Vision (ICCV), 2023
Daehee Kim
Yoon Kim
Donghyun Kim
Yumin Lim
Geewook Kim
Taeho Kil
431
4
0
21 Sep 2023
Vision Grid Transformer for Document Layout Analysis
Vision Grid Transformer for Document Layout AnalysisIEEE International Conference on Computer Vision (ICCV), 2023
Cheng Da
Chuwei Luo
Qi Zheng
Cong Yao
ViT
291
62
0
29 Aug 2023
Beyond Document Page Classification: Design, Datasets, and Challenges
Beyond Document Page Classification: Design, Datasets, and ChallengesIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Jordy Van Landeghem
Sanket Biswas
Matthew B. Blaschko
Marie-Francine Moens
270
11
0
24 Aug 2023
A Graphical Approach to Document Layout Analysis
A Graphical Approach to Document Layout AnalysisIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023
Jilin Wang
Michael Krumdick
Baojia Tong
Hamima Halim
M. Sokolov
Vadym Barda
Delphine Vendryes
Christy Tanner
314
18
0
03 Aug 2023
RealCQA: Scientific Chart Question Answering as a Test-bed for
  First-Order Logic
RealCQA: Scientific Chart Question Answering as a Test-bed for First-Order LogicIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023
Saleem Ahmed
Bhavin Jawade
Shubham Pandey
S. Setlur
Venugopal Govindaraju
191
8
0
03 Aug 2023
SpaDen : Sparse and Dense Keypoint Estimation for Real-World Chart
  Understanding
SpaDen : Sparse and Dense Keypoint Estimation for Real-World Chart UnderstandingIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023
Saleem Ahmed
Pengyu Yan
David Doermann
S. Setlur
Venugopal Govindaraju
175
2
0
03 Aug 2023
Bridging the Performance Gap between DETR and R-CNN for Graphical Object
  Detection in Document Images
Bridging the Performance Gap between DETR and R-CNN for Graphical Object Detection in Document Images
Tahira Shehzadi
K. Hashmi
D. Stricker
Marcus Liwicki
Muhammad Zeshan Afzal
312
11
0
23 Jun 2023
On Evaluation of Document Classification using RVL-CDIP
On Evaluation of Document Classification using RVL-CDIP
Stefan Larson
Gordon Lim
Kevin Leach
348
6
0
21 Jun 2023
DocumentNet: Bridging the Data Gap in Document Pre-Training
DocumentNet: Bridging the Data Gap in Document Pre-TrainingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Lijun Yu
Jin Miao
Xiaoyu Sun
Jiayi Chen
Alexander G. Hauptmann
H. Dai
Wei Wei
183
4
0
15 Jun 2023
DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents
DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents
Fuxiao Liu
Hao Tan
Chris Tensmeyer
CLIPVLM
333
18
0
09 Jun 2023
Do-GOOD: Towards Distribution Shift Evaluation for Pre-Trained Visual
  Document Understanding Models
Do-GOOD: Towards Distribution Shift Evaluation for Pre-Trained Visual Document Understanding ModelsAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2023
Jiabang He
Yilang Hu
Lei Wang
Xingdong Xu
Ning Liu
Hui-juan Liu
Hengtao Shen
VLMOOD
199
6
0
05 Jun 2023
12
Next
Page 1 of 2