ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.08609
  4. Cited By
Document AI: Benchmarks, Models and Applications

Document AI: Benchmarks, Models and Applications

16 November 2021
Lei Cui
Yiheng Xu
Tengchao Lv
Furu Wei
    VLM
ArXiv (abs)PDFHTML

Papers citing "Document AI: Benchmarks, Models and Applications"

45 / 45 papers shown
Title
FlexDoc: Parameterized Sampling for Diverse Multilingual Synthetic Documents for Training Document Understanding Models
FlexDoc: Parameterized Sampling for Diverse Multilingual Synthetic Documents for Training Document Understanding Models
Karan Dua
Hitesh Laxmichand Patel
Puneet Mittal
Ranjeet Gupta
Amit Agarwal
Praneet Pabolu
Srikant Panda
Hansa Meghwani
Graham Horwood
Fahad Shah
SyDa
112
1
0
02 Oct 2025
OTCR: Optimal Transmission, Compression and Representation for Multimodal Information Extraction
OTCR: Optimal Transmission, Compression and Representation for Multimodal Information Extraction
Y. Li
Yajiao Wang
Wenhao Hu
Z. Zhang
Mengting Zhang
68
0
0
17 Sep 2025
LED Benchmark: Diagnosing Structural Layout Errors for Document Layout Analysis
LED Benchmark: Diagnosing Structural Layout Errors for Document Layout Analysis
Inbum Heo
Taewook Hwang
Jeesu Jung
S. Jung
3DV
114
0
0
31 Jul 2025
MM-R5: MultiModal Reasoning-Enhanced ReRanker via Reinforcement Learning for Document Retrieval
MM-R5: MultiModal Reasoning-Enhanced ReRanker via Reinforcement Learning for Document Retrieval
Mingjun Xu
Jinhan Dong
Jue Hou
Zehui Wang
Cunchun Li
Zhifeng Gao
Renxin Zhong
Hengxing Cai
AI4TSLRM
260
6
0
14 Jun 2025
Doc-CoB: Enhancing Multi-Modal Document Understanding with Visual Chain-of-Boxes Reasoning
Doc-CoB: Enhancing Multi-Modal Document Understanding with Visual Chain-of-Boxes Reasoning
Ye Mo
Zirui Shao
Kai Ye
Xianwei Mao
Bo Zhang
...
Gang Huang
Kehan Chen
Zhou Huan
Zixu Yan
Sheng Zhou
LRM
261
3
0
24 May 2025
Visual Text Processing: A Comprehensive Review and Unified Evaluation
Visual Text Processing: A Comprehensive Review and Unified Evaluation
Yan Shu
Weichao Zeng
Fangmin Zhao
Zeyu Chen
Zhiyu Li
...
Paolo Rota
Xiang Bai
Lianwen Jin
Xu-Cheng Yin
Andrii Zadaianchuk
CoGe
420
6
0
30 Apr 2025
A Simple yet Effective Layout Token in Large Language Models for Document Understanding
A Simple yet Effective Layout Token in Large Language Models for Document UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025
Zhaoqing Zhu
Chuwei Luo
Zirui Shao
Feiyu Gao
Hangdi Xing
Qi Zheng
Ji Zhang
304
7
0
24 Mar 2025
Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs
Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs
Gaye Colakoglu
Gürkan Solmaz
Jonathan Fürst
289
4
0
25 Feb 2025
See then Tell: Enhancing Key Information Extraction with Vision Grounding
See then Tell: Enhancing Key Information Extraction with Vision Grounding
Shuhang Liu
Zhenrong Zhang
Pengfei Hu
Jiefeng Ma
Jun Du
Qing Wang
Jianshu Zhang
Chenyu Liu
239
1
0
29 Sep 2024
DocMamba: Efficient Document Pre-training with State Space Model
DocMamba: Efficient Document Pre-training with State Space ModelAAAI Conference on Artificial Intelligence (AAAI), 2024
Pengfei Hu
Zhenrong Zhang
Jiefeng Ma
Shuhang Liu
Jun Du
Jianshu Zhang
Mamba
262
1
0
18 Sep 2024
Deep Learning based Visually Rich Document Content Understanding: A Survey
Deep Learning based Visually Rich Document Content Understanding: A Survey
Muhammad Ali
Jean Lee
Salman Khan
Eduard Hovy
450
15
0
02 Aug 2024
OfficeBench: Benchmarking Language Agents across Multiple Applications
  for Office Automation
OfficeBench: Benchmarking Language Agents across Multiple Applications for Office Automation
Zilong Wang
Yuedong Cui
Li Zhong
Zimin Zhang
Da Yin
Bill Yuchen Lin
Jingbo Shang
258
20
0
26 Jul 2024
ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data
ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data
Yufan Shen
Chuwei Luo
Zhaoqing Zhu
Yang Chen
Qi Zheng
Zhi Yu
Jiajun Bu
Cong Yao
395
5
0
17 Jul 2024
DocKylin: A Large Multimodal Model for Visual Document Understanding
  with Efficient Visual Slimming
DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming
Jiaxin Zhang
Wentao Yang
Songxuan Lai
Zecheng Xie
Lianwen Jin
359
28
0
27 Jun 2024
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications
Jordy Van Landeghem
Subhajit Maity
Ayan Banerjee
Matthew Blaschko
Marie-Francine Moens
Josep Lladós
Sanket Biswas
346
3
0
12 Jun 2024
XFormParser: A Simple and Effective Multimodal Multilingual
  Semi-structured Form Parser
XFormParser: A Simple and Effective Multimodal Multilingual Semi-structured Form Parser
Xianfu Cheng
Hang Zhang
Zhiqiang Wang
Xiang Li
Weixiao Zhou
...
Fei Liu
Wei Zhang
Tao Sun
Tongliang Li
Zhoujun Li
228
4
0
27 May 2024
A Hybrid Approach for Document Layout Analysis in Document images
A Hybrid Approach for Document Layout Analysis in Document images
Tahira Shehzadi
Didier Stricker
Muhammad Zeshan Afzal
202
11
0
27 Apr 2024
HRVDA: High-Resolution Visual Document Assistant
HRVDA: High-Resolution Visual Document AssistantComputer Vision and Pattern Recognition (CVPR), 2024
Chaohu Liu
Kun Yin
Haoyu Cao
Xinghua Jiang
Xin Li
Yinsong Liu
Deqiang Jiang
Xing Sun
Linli Xu
VLM
267
31
0
10 Apr 2024
LayoutLLM: Layout Instruction Tuning with Large Language Models for
  Document Understanding
LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding
Chuwei Luo
Yufan Shen
Zhaoqing Zhu
Qi Zheng
Zhi Yu
Cong Yao
361
93
0
08 Apr 2024
Can AI Models Appreciate Document Aesthetics? An Exploration of
  Legibility and Layout Quality in Relation to Prediction Confidence
Can AI Models Appreciate Document Aesthetics? An Exploration of Legibility and Layout Quality in Relation to Prediction Confidence
Hsiu-Wei Yang
Abhinav Agrawal
Pavlos Fragkogiannis
Shubham Nitin Mulay
242
2
0
27 Mar 2024
A Survey of Table Reasoning with Large Language Models
A Survey of Table Reasoning with Large Language Models
Xuanliang Zhang
Dingzirui Wang
Longxu Dou
Qingfu Zhu
Wanxiang Che
LMTDLRM
295
22
0
13 Feb 2024
DocLLM: A layout-aware generative language model for multimodal document
  understanding
DocLLM: A layout-aware generative language model for multimodal document understandingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Dongsheng Wang
Natraj Raman
Mathieu Sibue
Zhiqiang Ma
Petr Babkin
Simerjot Kaur
Yulong Pei
Armineh Nourbakhsh
Xiaomo Liu
VLM
248
101
0
31 Dec 2023
A Multi-Modal Multilingual Benchmark for Document Image Classification
A Multi-Modal Multilingual Benchmark for Document Image ClassificationConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yoshinari Fujinuma
Siddharth Varia
Nishant Sankaran
Srikar Appalaraju
Bonan Min
Yogarshi Vyas
VLM
215
5
0
25 Oct 2023
DocXChain: A Powerful Open-Source Toolchain for Document Parsing and
  Beyond
DocXChain: A Powerful Open-Source Toolchain for Document Parsing and Beyond
Cong Yao
196
10
0
19 Oct 2023
GridFormer: Towards Accurate Table Structure Recognition via Grid
  Prediction
GridFormer: Towards Accurate Table Structure Recognition via Grid PredictionACM Multimedia (ACM MM), 2023
Pengyuan Lyu
Weihong Ma
Hongyi Wang
Yu Yu
Chengquan Zhang
Kun Yao
Yang Xue
Jingdong Wang
LMTD
282
18
0
26 Sep 2023
Comprehensive Overview of Named Entity Recognition: Models,
  Domain-Specific Applications and Challenges
Comprehensive Overview of Named Entity Recognition: Models, Domain-Specific Applications and Challenges
Kalyani Pakhale
247
34
0
25 Sep 2023
Kosmos-2.5: A Multimodal Literate Model
Kosmos-2.5: A Multimodal Literate Model
Tengchao Lv
Yupan Huang
Jingye Chen
Lei Cui
Shuming Ma
...
Weiyao Luo
Shaoxiang Wu
Guoxin Wang
Cha Zhang
Furu Wei
VLMMLLM
256
87
0
20 Sep 2023
Attention Where It Matters: Rethinking Visual Document Understanding
  with Selective Region Concentration
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region ConcentrationIEEE International Conference on Computer Vision (ICCV), 2023
H. Cao
Changcun Bao
Chaohu Liu
Huang-wei Chen
Kun Yin
Hao Liu
Yinsong Liu
Deqiang Jiang
Xing Sun
194
16
0
03 Sep 2023
Document AI: A Comparative Study of Transformer-Based, Graph-Based
  Models, and Convolutional Neural Networks For Document Layout Analysis
Document AI: A Comparative Study of Transformer-Based, Graph-Based Models, and Convolutional Neural Networks For Document Layout Analysis
Sotirios Kastanas
Shaomu Tan
Yijiang He
129
1
0
29 Aug 2023
Vision Grid Transformer for Document Layout Analysis
Vision Grid Transformer for Document Layout AnalysisIEEE International Conference on Computer Vision (ICCV), 2023
Cheng Da
Chuwei Luo
Qi Zheng
Cong Yao
ViT
214
50
0
29 Aug 2023
A Graphical Approach to Document Layout Analysis
A Graphical Approach to Document Layout AnalysisIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023
Jilin Wang
Michael Krumdick
Baojia Tong
Hamima Halim
M. Sokolov
Vadym Barda
Delphine Vendryes
Christy Tanner
156
14
0
03 Aug 2023
DocTr: Document Transformer for Structured Information Extraction in
  Documents
DocTr: Document Transformer for Structured Information Extraction in DocumentsIEEE International Conference on Computer Vision (ICCV), 2023
Haofu Liao
Aruni RoyChowdhury
Weijian Li
Ankan Bansal
Yuting Zhang
Zhuowen Tu
R. Satzoda
R. Manmatha
Vijay Mahadevan
196
22
0
16 Jul 2023
Global Structure Knowledge-Guided Relation Extraction Method for
  Visually-Rich Document
Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich DocumentConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Xiangnan Chen
Qianwen Xiao
Juncheng Li
Duo Dong
Jun Lin
Xiaozhong Liu
Siliang Tang
206
6
0
23 May 2023
Doc2SoarGraph: Discrete Reasoning over Visually-Rich Table-Text
  Documents via Semantic-Oriented Hierarchical Graphs
Doc2SoarGraph: Discrete Reasoning over Visually-Rich Table-Text Documents via Semantic-Oriented Hierarchical GraphsInternational Conference on Language Resources and Evaluation (LREC), 2023
Fengbin Zhu
Chao Wang
Fuli Feng
Zifeng Ren
Moxin Li
Tat-Seng Chua
200
6
0
03 May 2023
Structure Diagram Recognition in Financial Announcements
Structure Diagram Recognition in Financial AnnouncementsIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023
Meixuan Qiao
Jun Wang
Junfu Xiang
Qiyu Hou
Ruixuan Li
163
2
0
26 Apr 2023
GeoLayoutLM: Geometric Pre-training for Visual Information Extraction
GeoLayoutLM: Geometric Pre-training for Visual Information ExtractionComputer Vision and Pattern Recognition (CVPR), 2023
Chuwei Luo
Changxu Cheng
Qi Zheng
Cong Yao
244
61
0
21 Apr 2023
HRDoc: Dataset and Baseline Method Toward Hierarchical Reconstruction of
  Document Structures
HRDoc: Dataset and Baseline Method Toward Hierarchical Reconstruction of Document StructuresAAAI Conference on Artificial Intelligence (AAAI), 2023
Jiefeng Ma
Jun Du
Pengfei Hu
Zhenrong Zhang
Jianshu Zhang
Huihui Zhu
Cong Liu
192
18
0
24 Mar 2023
Wukong-Reader: Multi-modal Pre-training for Fine-grained Visual Document
  Understanding
Wukong-Reader: Multi-modal Pre-training for Fine-grained Visual Document UnderstandingAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Haoli Bai
Zhiguang Liu
Xiaojun Meng
Wentao Li
Shuangning Liu
...
Liangwei Wang
Lu Hou
Jiansheng Wei
Xin Jiang
Qun Liu
ViT
218
16
0
19 Dec 2022
XDoc: Unified Pre-training for Cross-Format Document Understanding
XDoc: Unified Pre-training for Cross-Format Document UnderstandingConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Jingye Chen
Tengchao Lv
Lei Cui
Changrong Zhang
Furu Wei
261
16
0
06 Oct 2022
Detect, Retrieve, Comprehend: A Flexible Framework for Zero-Shot
  Document-Level Question Answering
Detect, Retrieve, Comprehend: A Flexible Framework for Zero-Shot Document-Level Question Answering
T. McDonald
Brian Tsan
Amar Saini
Juanita Ordoñez
Luis Gutierrez
Phan-Anh-Huy Nguyen
Blake Mason
Brenda Ng
RALM
276
3
0
04 Oct 2022
ERNIE-mmLayout: Multi-grained MultiModal Transformer for Document
  Understanding
ERNIE-mmLayout: Multi-grained MultiModal Transformer for Document Understanding
Wenjin Wang
Zhengjie Huang
Bin Luo
Qianglong Chen
Qiming Peng
...
Weichong Yin
Shi Feng
Yu Sun
Dianhai Yu
Yin Zhang
ViT
148
14
0
18 Sep 2022
Towards Complex Document Understanding By Discrete Reasoning
Towards Complex Document Understanding By Discrete ReasoningACM Multimedia (ACM MM), 2022
Fengbin Zhu
Wenqiang Lei
Fuli Feng
Chao Wang
Haozhou Zhang
Tat-Seng Chua
287
82
0
25 Jul 2022
LayoutLMv3: Pre-training for Document AI with Unified Text and Image
  Masking
LayoutLMv3: Pre-training for Document AI with Unified Text and Image MaskingACM Multimedia (ACM MM), 2022
Yupan Huang
Tengchao Lv
Lei Cui
Yutong Lu
Furu Wei
635
621
0
18 Apr 2022
Do BERTs Learn to Use Browser User Interface? Exploring Multi-Step Tasks
  with Unified Vision-and-Language BERTs
Do BERTs Learn to Use Browser User Interface? Exploring Multi-Step Tasks with Unified Vision-and-Language BERTs
Taichi Iki
Akiko Aizawa
LLMAG
183
6
0
15 Mar 2022
DiT: Self-supervised Pre-training for Document Image Transformer
DiT: Self-supervised Pre-training for Document Image TransformerACM Multimedia (ACM MM), 2022
Junlong Li
Yiheng Xu
Tengchao Lv
Lei Cui
Chaoxi Zhang
Furu Wei
ViTVLM
357
206
0
04 Mar 2022
1