Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1912.13318
Cited By
LayoutLM: Pre-training of Text and Layout for Document Image Understanding
31 December 2019
Yiheng Xu
Minghao Li
Lei Cui
Shaohan Huang
Furu Wei
Ming Zhou
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LayoutLM: Pre-training of Text and Layout for Document Image Understanding"
50 / 371 papers shown
Title
RoDLA: Benchmarking the Robustness of Document Layout Analysis Models
Yufan Chen
Jiaming Zhang
Kunyu Peng
Junwei Zheng
Ruiping Liu
Philip Torr
Rainer Stiefelhagen
OOD
29
5
0
21 Mar 2024
LayoutLLM: Large Language Model Instruction Tuning for Visually Rich Document Understanding
Masato Fujitake
MLLM
27
15
0
21 Mar 2024
From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models
Kung-Hsiang Huang
Hou Pong Chan
Yi R. Fung
Haoyi Qiu
Mingyang Zhou
Chenyu You
Shih-Fu Chang
Chenhui Xu
AI4TS
72
18
0
18 Mar 2024
depyf: Open the Opaque Box of PyTorch Compiler for Machine Learning Researchers
Kaichao You
Runsheng Bai
Meng Cao
Jianmin Wang
Ion Stoica
Mingsheng Long
VLM
33
0
0
14 Mar 2024
Adversarial Training with OCR Modality Perturbation for Scene-Text Visual Question Answering
Zhixuan Shen
Haonan Luo
Sijia Li
Tianrui Li
26
0
0
14 Mar 2024
The future of document indexing: GPT and Donut revolutionize table of content processing
Degaga Wolde Feyisa
Haylemicheal Berihun
Amanuel Zewdu
Mahsa Najimoghadam
Marzieh Zare
34
0
0
12 Mar 2024
TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document
Yuliang Liu
Biao Yang
Qiang Liu
Zhang Li
Zhiyin Ma
Shuo Zhang
Xiang Bai
MLLM
VLM
49
91
0
07 Mar 2024
Transformers and Language Models in Form Understanding: A Comprehensive Review of Scanned Document Analysis
Abdelrahman Abdallah
Daniel Eberharter
Zoe Pfister
Adam Jatowt
40
12
0
06 Mar 2024
LOCR: Location-Guided Transformer for Optical Character Recognition
Yu Sun
Dongzhan Zhou
Chen Lin
Conghui He
Wanli Ouyang
Han-Sen Zhong
40
1
0
04 Mar 2024
Hypertext Entity Extraction in Webpage
Yifei Yang
Tianqiao Liu
Bo Shao
Hai Zhao
Linjun Shou
Ming Gong
Daxin Jiang
44
0
0
04 Mar 2024
Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models
Xin Li
Yunfei Wu
Xinghua Jiang
Zhihao Guo
Ming Gong
Haoyu Cao
Yinsong Liu
Deqiang Jiang
Xing Sun
VLM
37
12
0
29 Feb 2024
CoCo-Agent: A Comprehensive Cognitive MLLM Agent for Smartphone GUI Automation
Xinbei Ma
ZhuoSheng Zhang
Hai Zhao
LLMAG
41
23
0
19 Feb 2024
LAPDoc: Layout-Aware Prompting for Documents
Marcel Lamott
Yves-Noel Weweler
A. Ulges
Faisal Shafait
Dirk Krechel
Darko Obradovic
54
5
0
15 Feb 2024
ClusterTabNet: Supervised clustering method for table detection and table structure recognition
Marek Polewczyk
Marco Spinaci
LMTD
35
0
0
12 Feb 2024
Text Role Classification in Scientific Charts Using Multimodal Transformers
Hye Jin Kim
N. Lell
A. Scherp
24
0
0
08 Feb 2024
ANLS* -- A Universal Document Processing Metric for Generative Large Language Models
David Peer
Philemon Schöpf
V. Nebendahl
A. Rietzler
Sebastian Stabinger
35
3
0
06 Feb 2024
Large Language Model for Table Processing: A Survey
Weizheng Lu
Jiaming Zhang
Jing Zhang
Yueguo Chen
LMTD
63
26
0
04 Feb 2024
Instruction Makes a Difference
Tosin Adewumi
Nudrat Habib
Lama Alkhaled
Elisa Barney
VLM
MLLM
21
1
0
01 Feb 2024
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions
Ryota Tanaka
Taichi Iki
Kyosuke Nishida
Kuniko Saito
Jun Suzuki
VLM
21
23
0
24 Jan 2024
UniVIE: A Unified Label Space Approach to Visual Information Extraction from Form-like Documents
Kai Hu
Jiawei Wang
Weihong Lin
Zhuoyao Zhong
Lei-huan Sun
Qiang Huo
42
1
0
17 Jan 2024
PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction
Zening Lin
Jiapeng Wang
Teng Li
Wenhui Liao
Dayi Huang
Longfei Xiong
Lianwen Jin
26
2
0
07 Jan 2024
GRAM: Global Reasoning for Multi-Page VQA
Tsachi Blau
Sharon Fogel
Roi Ronen
Alona Golts
Roy Ganz
Elad Ben Avraham
Aviad Aberdam
Shahar Tsiper
Ron Litman
19
12
0
07 Jan 2024
DocGraphLM: Documental Graph Language Model for Information Extraction
Dongsheng Wang
Zhiqiang Ma
Armineh Nourbakhsh
Kang Gu
Sameena Shah
38
8
0
05 Jan 2024
LORE++: Logical Location Regression Network for Table Structure Recognition with Pre-training
Rujiao Long
Hangdi Xing
Zhibo Yang
Qi Zheng
Zhi Yu
Cong Yao
Fei Huang
29
4
0
03 Jan 2024
DocLLM: A layout-aware generative language model for multimodal document understanding
Dongsheng Wang
Natraj Raman
Mathieu Sibue
Zhiqiang Ma
Petr Babkin
Simerjot Kaur
Yulong Pei
Armineh Nourbakhsh
Xiaomo Liu
VLM
22
53
0
31 Dec 2023
LANS: A Layout-Aware Neural Solver for Plane Geometry Problem
Zhong-Zhi Li
Ming-Liang Zhang
Fei Yin
Cheng-Lin Liu
21
14
0
25 Nov 2023
EIGEN: Expert-Informed Joint Learning Aggregation for High-Fidelity Information Extraction from Document Images
A. Singh
Venkatapathy Subramanian
Ayush Maheshwari
Pradeep Narayan
D. P. Shetty
Ganesh Ramakrishnan
17
3
0
23 Nov 2023
Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs
Yonghui Wang
Wen-gang Zhou
Hao Feng
Keyi Zhou
Houqiang Li
66
19
0
22 Nov 2023
FATURA: A Multi-Layout Invoice Image Dataset for Document Analysis and Understanding
Mahmoud Limam
M. Dhiaf
Yousri Kessentini
23
2
0
20 Nov 2023
DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding
Hao Feng
Qi Liu
Hao Liu
Wen-gang Zhou
Houqiang Li
Can Huang
VLM
25
63
0
20 Nov 2023
Multiple-Question Multiple-Answer Text-VQA
Peng Tang
Srikar Appalaraju
R. Manmatha
Yusheng Xie
Vijay Mahadevan
46
5
0
15 Nov 2023
ETDPC: A Multimodality Framework for Classifying Pages in Electronic Theses and Dissertations
Muntabir Hasan Choudhury
Lamia Salsabil
William A. Ingram
Edward A. Fox
Jian Wu
35
0
0
07 Nov 2023
On Task-personalized Multimodal Few-shot Learning for Visually-rich Document Entity Retrieval
Jiayi Chen
H. Dai
Bo Dai
Aidong Zhang
Wei Wei
36
2
0
01 Nov 2023
A Scalable Framework for Table of Contents Extraction from Complex ESG Annual Reports
Xinyu Wang
Lin Gui
Yulan He
LMTD
31
2
0
27 Oct 2023
Exploring OCR Capabilities of GPT-4V(ision) : A Quantitative and In-depth Evaluation
Yongxin Shi
Dezhi Peng
Wenhui Liao
Zening Lin
Xinhong Chen
Chongyu Liu
Yuyi Zhang
Lianwen Jin
MLLM
30
44
0
25 Oct 2023
Enhancing Document Information Analysis with Multi-Task Pre-training: A Robust Approach for Information Extraction in Visually-Rich Documents
Tofik Ali
Partha Pratim Roy
16
0
0
25 Oct 2023
A Multi-Modal Multilingual Benchmark for Document Image Classification
Yoshinari Fujinuma
Siddharth Varia
Nishant Sankaran
Srikar Appalaraju
Bonan Min
Yogarshi Vyas
VLM
22
4
0
25 Oct 2023
GenKIE: Robust Generative Multimodal Document Key Information Extraction
Panfeng Cao
Ye Wang
Qiang Zhang
Zaiqiao Meng
SyDa
29
6
0
24 Oct 2023
DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading
Hao Wang
Qingxuan Wang
Yue Li
Changqing Wang
Chenhui Chu
Rui-cang Wang
VGen
21
3
0
23 Oct 2023
Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning
Hao Wang
Xiahua Chen
Rui-cang Wang
Chenhui Chu
27
0
0
23 Oct 2023
VKIE: The Application of Key Information Extraction on Video Text
Siyu An
Ye Liu
Haoyuan Peng
Di Yin
27
1
0
18 Oct 2023
Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction
Chong Zhang
Ya Guo
Yi Tu
Huan Chen
Jinyang Tang
Huijia Zhu
Qi Zhang
Tao Gui
3DV
37
20
0
17 Oct 2023
Overview of ImageArg-2023: The First Shared Task in Multimodal Argument Mining
Zhexiong Liu
Mohamed Elarby
Yang Zhong
Diane Litman
19
11
0
15 Oct 2023
Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API
Zhizheng Zhang
Wenxuan Xie
Xiaoyi Zhang
Yan Lu
34
10
0
07 Oct 2023
appjsonify: An Academic Paper PDF-to-JSON Conversion Toolkit
Atsuki Yamaguchi
Terufumi Morishita
24
1
0
02 Oct 2023
LogicMP: A Neuro-symbolic Approach for Encoding First-order Logic Constraints
Weidi Xu
Jingwei Wang
Lele Xie
Jianshan He
Hongting Zhou
Taifeng Wang
Xiaopei Wan
Jingdong Chen
Chao Qu
Wei Chu
29
1
0
27 Sep 2023
Document Understanding for Healthcare Referrals
Jimit Mistry
N. Arzeno
MedIm
18
0
0
22 Sep 2023
SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap
Daehee Kim
Yoon Kim
Donghyun Kim
Yumin Lim
Geewook Kim
Taeho Kil
36
3
0
21 Sep 2023
Kosmos-2.5: A Multimodal Literate Model
Tengchao Lv
Yupan Huang
Jingye Chen
Lei Cui
Shuming Ma
...
Weiyao Luo
Shaoxiang Wu
Guoxin Wang
Cha Zhang
Furu Wei
VLM
MLLM
36
64
0
20 Sep 2023
LMDX: Language Model-based Document Information Extraction and Localization
Vincent Perot
Kai Kang
Florian Luisier
Guolong Su
Xiaoyu Sun
...
Zifeng Wang
Jiaqi Mu
Hao Zhang
Chen-Yu Lee
Nan Hua
56
31
0
19 Sep 2023
Previous
1
2
3
4
5
6
7
8
Next