Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2111.08609
Cited By
Document AI: Benchmarks, Models and Applications
16 November 2021
Lei Cui
Yiheng Xu
Tengchao Lv
Furu Wei
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Document AI: Benchmarks, Models and Applications"
45 / 45 papers shown
Title
FlexDoc: Parameterized Sampling for Diverse Multilingual Synthetic Documents for Training Document Understanding Models
Karan Dua
Hitesh Laxmichand Patel
Puneet Mittal
Ranjeet Gupta
Amit Agarwal
Praneet Pabolu
Srikant Panda
Hansa Meghwani
Graham Horwood
Fahad Shah
SyDa
112
1
0
02 Oct 2025
OTCR: Optimal Transmission, Compression and Representation for Multimodal Information Extraction
Y. Li
Yajiao Wang
Wenhao Hu
Z. Zhang
Mengting Zhang
68
0
0
17 Sep 2025
LED Benchmark: Diagnosing Structural Layout Errors for Document Layout Analysis
Inbum Heo
Taewook Hwang
Jeesu Jung
S. Jung
3DV
114
0
0
31 Jul 2025
MM-R5: MultiModal Reasoning-Enhanced ReRanker via Reinforcement Learning for Document Retrieval
Mingjun Xu
Jinhan Dong
Jue Hou
Zehui Wang
Cunchun Li
Zhifeng Gao
Renxin Zhong
Hengxing Cai
AI4TS
LRM
260
6
0
14 Jun 2025
Doc-CoB: Enhancing Multi-Modal Document Understanding with Visual Chain-of-Boxes Reasoning
Ye Mo
Zirui Shao
Kai Ye
Xianwei Mao
Bo Zhang
...
Gang Huang
Kehan Chen
Zhou Huan
Zixu Yan
Sheng Zhou
LRM
261
3
0
24 May 2025
Visual Text Processing: A Comprehensive Review and Unified Evaluation
Yan Shu
Weichao Zeng
Fangmin Zhao
Zeyu Chen
Zhiyu Li
...
Paolo Rota
Xiang Bai
Lianwen Jin
Xu-Cheng Yin
Andrii Zadaianchuk
CoGe
420
6
0
30 Apr 2025
A Simple yet Effective Layout Token in Large Language Models for Document Understanding
Computer Vision and Pattern Recognition (CVPR), 2025
Zhaoqing Zhu
Chuwei Luo
Zirui Shao
Feiyu Gao
Hangdi Xing
Qi Zheng
Ji Zhang
304
7
0
24 Mar 2025
Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs
Gaye Colakoglu
Gürkan Solmaz
Jonathan Fürst
289
4
0
25 Feb 2025
See then Tell: Enhancing Key Information Extraction with Vision Grounding
Shuhang Liu
Zhenrong Zhang
Pengfei Hu
Jiefeng Ma
Jun Du
Qing Wang
Jianshu Zhang
Chenyu Liu
239
1
0
29 Sep 2024
DocMamba: Efficient Document Pre-training with State Space Model
AAAI Conference on Artificial Intelligence (AAAI), 2024
Pengfei Hu
Zhenrong Zhang
Jiefeng Ma
Shuhang Liu
Jun Du
Jianshu Zhang
Mamba
262
1
0
18 Sep 2024
Deep Learning based Visually Rich Document Content Understanding: A Survey
Muhammad Ali
Jean Lee
Salman Khan
Eduard Hovy
450
15
0
02 Aug 2024
OfficeBench: Benchmarking Language Agents across Multiple Applications for Office Automation
Zilong Wang
Yuedong Cui
Li Zhong
Zimin Zhang
Da Yin
Bill Yuchen Lin
Jingbo Shang
258
20
0
26 Jul 2024
ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data
Yufan Shen
Chuwei Luo
Zhaoqing Zhu
Yang Chen
Qi Zheng
Zhi Yu
Jiajun Bu
Cong Yao
395
5
0
17 Jul 2024
DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming
Jiaxin Zhang
Wentao Yang
Songxuan Lai
Zecheng Xie
Lianwen Jin
359
28
0
27 Jun 2024
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications
Jordy Van Landeghem
Subhajit Maity
Ayan Banerjee
Matthew Blaschko
Marie-Francine Moens
Josep Lladós
Sanket Biswas
346
3
0
12 Jun 2024
XFormParser: A Simple and Effective Multimodal Multilingual Semi-structured Form Parser
Xianfu Cheng
Hang Zhang
Zhiqiang Wang
Xiang Li
Weixiao Zhou
...
Fei Liu
Wei Zhang
Tao Sun
Tongliang Li
Zhoujun Li
228
4
0
27 May 2024
A Hybrid Approach for Document Layout Analysis in Document images
Tahira Shehzadi
Didier Stricker
Muhammad Zeshan Afzal
202
11
0
27 Apr 2024
HRVDA: High-Resolution Visual Document Assistant
Computer Vision and Pattern Recognition (CVPR), 2024
Chaohu Liu
Kun Yin
Haoyu Cao
Xinghua Jiang
Xin Li
Yinsong Liu
Deqiang Jiang
Xing Sun
Linli Xu
VLM
267
31
0
10 Apr 2024
LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding
Chuwei Luo
Yufan Shen
Zhaoqing Zhu
Qi Zheng
Zhi Yu
Cong Yao
361
93
0
08 Apr 2024
Can AI Models Appreciate Document Aesthetics? An Exploration of Legibility and Layout Quality in Relation to Prediction Confidence
Hsiu-Wei Yang
Abhinav Agrawal
Pavlos Fragkogiannis
Shubham Nitin Mulay
242
2
0
27 Mar 2024
A Survey of Table Reasoning with Large Language Models
Xuanliang Zhang
Dingzirui Wang
Longxu Dou
Qingfu Zhu
Wanxiang Che
LMTD
LRM
295
22
0
13 Feb 2024
DocLLM: A layout-aware generative language model for multimodal document understanding
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Dongsheng Wang
Natraj Raman
Mathieu Sibue
Zhiqiang Ma
Petr Babkin
Simerjot Kaur
Yulong Pei
Armineh Nourbakhsh
Xiaomo Liu
VLM
248
101
0
31 Dec 2023
A Multi-Modal Multilingual Benchmark for Document Image Classification
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yoshinari Fujinuma
Siddharth Varia
Nishant Sankaran
Srikar Appalaraju
Bonan Min
Yogarshi Vyas
VLM
215
5
0
25 Oct 2023
DocXChain: A Powerful Open-Source Toolchain for Document Parsing and Beyond
Cong Yao
196
10
0
19 Oct 2023
GridFormer: Towards Accurate Table Structure Recognition via Grid Prediction
ACM Multimedia (ACM MM), 2023
Pengyuan Lyu
Weihong Ma
Hongyi Wang
Yu Yu
Chengquan Zhang
Kun Yao
Yang Xue
Jingdong Wang
LMTD
282
18
0
26 Sep 2023
Comprehensive Overview of Named Entity Recognition: Models, Domain-Specific Applications and Challenges
Kalyani Pakhale
247
34
0
25 Sep 2023
Kosmos-2.5: A Multimodal Literate Model
Tengchao Lv
Yupan Huang
Jingye Chen
Lei Cui
Shuming Ma
...
Weiyao Luo
Shaoxiang Wu
Guoxin Wang
Cha Zhang
Furu Wei
VLM
MLLM
256
87
0
20 Sep 2023
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
IEEE International Conference on Computer Vision (ICCV), 2023
H. Cao
Changcun Bao
Chaohu Liu
Huang-wei Chen
Kun Yin
Hao Liu
Yinsong Liu
Deqiang Jiang
Xing Sun
194
16
0
03 Sep 2023
Document AI: A Comparative Study of Transformer-Based, Graph-Based Models, and Convolutional Neural Networks For Document Layout Analysis
Sotirios Kastanas
Shaomu Tan
Yijiang He
129
1
0
29 Aug 2023
Vision Grid Transformer for Document Layout Analysis
IEEE International Conference on Computer Vision (ICCV), 2023
Cheng Da
Chuwei Luo
Qi Zheng
Cong Yao
ViT
214
50
0
29 Aug 2023
A Graphical Approach to Document Layout Analysis
IEEE International Conference on Document Analysis and Recognition (ICDAR), 2023
Jilin Wang
Michael Krumdick
Baojia Tong
Hamima Halim
M. Sokolov
Vadym Barda
Delphine Vendryes
Christy Tanner
156
14
0
03 Aug 2023
DocTr: Document Transformer for Structured Information Extraction in Documents
IEEE International Conference on Computer Vision (ICCV), 2023
Haofu Liao
Aruni RoyChowdhury
Weijian Li
Ankan Bansal
Yuting Zhang
Zhuowen Tu
R. Satzoda
R. Manmatha
Vijay Mahadevan
196
22
0
16 Jul 2023
Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich Document
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Xiangnan Chen
Qianwen Xiao
Juncheng Li
Duo Dong
Jun Lin
Xiaozhong Liu
Siliang Tang
206
6
0
23 May 2023
Doc2SoarGraph: Discrete Reasoning over Visually-Rich Table-Text Documents via Semantic-Oriented Hierarchical Graphs
International Conference on Language Resources and Evaluation (LREC), 2023
Fengbin Zhu
Chao Wang
Fuli Feng
Zifeng Ren
Moxin Li
Tat-Seng Chua
200
6
0
03 May 2023
Structure Diagram Recognition in Financial Announcements
IEEE International Conference on Document Analysis and Recognition (ICDAR), 2023
Meixuan Qiao
Jun Wang
Junfu Xiang
Qiyu Hou
Ruixuan Li
163
2
0
26 Apr 2023
GeoLayoutLM: Geometric Pre-training for Visual Information Extraction
Computer Vision and Pattern Recognition (CVPR), 2023
Chuwei Luo
Changxu Cheng
Qi Zheng
Cong Yao
244
61
0
21 Apr 2023
HRDoc: Dataset and Baseline Method Toward Hierarchical Reconstruction of Document Structures
AAAI Conference on Artificial Intelligence (AAAI), 2023
Jiefeng Ma
Jun Du
Pengfei Hu
Zhenrong Zhang
Jianshu Zhang
Huihui Zhu
Cong Liu
192
18
0
24 Mar 2023
Wukong-Reader: Multi-modal Pre-training for Fine-grained Visual Document Understanding
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Haoli Bai
Zhiguang Liu
Xiaojun Meng
Wentao Li
Shuangning Liu
...
Liangwei Wang
Lu Hou
Jiansheng Wei
Xin Jiang
Qun Liu
ViT
218
16
0
19 Dec 2022
XDoc: Unified Pre-training for Cross-Format Document Understanding
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Jingye Chen
Tengchao Lv
Lei Cui
Changrong Zhang
Furu Wei
261
16
0
06 Oct 2022
Detect, Retrieve, Comprehend: A Flexible Framework for Zero-Shot Document-Level Question Answering
T. McDonald
Brian Tsan
Amar Saini
Juanita Ordoñez
Luis Gutierrez
Phan-Anh-Huy Nguyen
Blake Mason
Brenda Ng
RALM
276
3
0
04 Oct 2022
ERNIE-mmLayout: Multi-grained MultiModal Transformer for Document Understanding
Wenjin Wang
Zhengjie Huang
Bin Luo
Qianglong Chen
Qiming Peng
...
Weichong Yin
Shi Feng
Yu Sun
Dianhai Yu
Yin Zhang
ViT
148
14
0
18 Sep 2022
Towards Complex Document Understanding By Discrete Reasoning
ACM Multimedia (ACM MM), 2022
Fengbin Zhu
Wenqiang Lei
Fuli Feng
Chao Wang
Haozhou Zhang
Tat-Seng Chua
287
82
0
25 Jul 2022
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
ACM Multimedia (ACM MM), 2022
Yupan Huang
Tengchao Lv
Lei Cui
Yutong Lu
Furu Wei
635
621
0
18 Apr 2022
Do BERTs Learn to Use Browser User Interface? Exploring Multi-Step Tasks with Unified Vision-and-Language BERTs
Taichi Iki
Akiko Aizawa
LLMAG
183
6
0
15 Mar 2022
DiT: Self-supervised Pre-training for Document Image Transformer
ACM Multimedia (ACM MM), 2022
Junlong Li
Yiheng Xu
Tengchao Lv
Lei Cui
Chaoxi Zhang
Furu Wei
ViT
VLM
357
206
0
04 Mar 2022
1