ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1912.13318
  4. Cited By
LayoutLM: Pre-training of Text and Layout for Document Image
  Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

31 December 2019
Yiheng Xu
Minghao Li
Lei Cui
Shaohan Huang
Furu Wei
Ming Zhou
ArXivPDFHTML

Papers citing "LayoutLM: Pre-training of Text and Layout for Document Image Understanding"

50 / 371 papers shown
Title
Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting
Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting
Hao Feng
Shu Wei
Xiang Fei
Wei Shi
Yingdong Han
...
Qi Liu
Chunhui Lin
Jingqun Tang
Hao Liu
Can Huang
14
0
0
20 May 2025
Information Extraction from Visually Rich Documents using LLM-based Organization of Documents into Independent Textual Segments
Information Extraction from Visually Rich Documents using LLM-based Organization of Documents into Independent Textual Segments
Aniket Bhattacharyya
Anurag Tripathi
Ujjal Das
Archan Karmakar
Amit Pathak
Maneesh Gupta
9
0
0
18 May 2025
Low-Resource Language Processing: An OCR-Driven Summarization and Translation Pipeline
Low-Resource Language Processing: An OCR-Driven Summarization and Translation Pipeline
Hrishit Madhavi
Jacob Cherian
Yuvraj Khamkar
Dhananjay Bhagat
VLM
24
0
0
16 May 2025
DocVXQA: Context-Aware Visual Explanations for Document Question Answering
DocVXQA: Context-Aware Visual Explanations for Document Question Answering
Mohamed Ali Souibgui
Changkyu Choi
Andrey Barsky
Kangsoo Jung
Ernest Valveny
Dimosthenis Karatzas
28
0
0
12 May 2025
XY-Cut++: Advanced Layout Ordering via Hierarchical Mask Mechanism on a Novel Benchmark
XY-Cut++: Advanced Layout Ordering via Hierarchical Mask Mechanism on a Novel Benchmark
Shuai Liu
Youmeng Li
Jizeng Wei
35
0
0
14 Apr 2025
Relation-Rich Visual Document Generator for Visual Information Extraction
Relation-Rich Visual Document Generator for Visual Information Extraction
Zi-Han Jiang
Chien-Wei Lin
Wei-Hua Li
Hsuan-Tung Liu
Yi-Ren Yeh
Chu-Song Chen
35
0
0
14 Apr 2025
AI for Climate Finance: Agentic Retrieval and Multi-Step Reasoning for Early Warning System Investments
AI for Climate Finance: Agentic Retrieval and Multi-Step Reasoning for Early Warning System Investments
S. Vaghefi
Aymane Hachcham
Veronica Grasso
Jiska Manicus
Nakiete Msemo
Chiara Colesanti-Senni
Markus Leippold
21
0
0
07 Apr 2025
Towards Visual Text Grounding of Multimodal Large Language Model
Towards Visual Text Grounding of Multimodal Large Language Model
Ming Li
Ruiyi Zhang
Jian Chen
Jiuxiang Gu
Yufan Zhou
Franck Dernoncourt
Wanrong Zhu
Dinesh Manocha
Tong Sun
41
2
0
07 Apr 2025
VISTA-OCR: Towards generative and interactive end to end OCR models
VISTA-OCR: Towards generative and interactive end to end OCR models
Laziz Hamdi
Amine Tamasna
Pascal Boisson
Thierry Paquet
49
0
0
04 Apr 2025
QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-free Visual Document Understanding
QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-free Visual Document Understanding
Binh M. Le
Shaoyuan Xu
Jinmiao Fu
Zhishen Huang
Moyan Li
Yanhui Guo
Hongdong Li
Sameera Ramasinghe
Bryan Wang
35
0
0
03 Apr 2025
Leveraging Contrast Information for Efficient Document Shadow Removal
Leveraging Contrast Information for Efficient Document Shadow Removal
Yong-Jin Liu
Jiancheng Huang
Na Liu
Mingfu Yan
Yi Huang
Shifeng Chen
41
0
0
01 Apr 2025
Improving Applicability of Deep Learning based Token Classification models during Training
Improving Applicability of Deep Learning based Token Classification models during Training
Anket Mehra
Malte Prieß
Marian Himstedt
46
0
0
28 Mar 2025
BiblioPage: A Dataset of Scanned Title Pages for Bibliographic Metadata Extraction
BiblioPage: A Dataset of Scanned Title Pages for Bibliographic Metadata Extraction
Jan Kohút
Martin Dočekal
Michal Hradiš
Marek Vaško
39
0
0
25 Mar 2025
A Simple yet Effective Layout Token in Large Language Models for Document Understanding
A Simple yet Effective Layout Token in Large Language Models for Document Understanding
Zhaoqing Zhu
Chuwei Luo
Zirui Shao
Feiyu Gao
Hangdi Xing
Qi Zheng
Ji Zhang
57
0
0
24 Mar 2025
Joint Extraction Matters: Prompt-Based Visual Question Answering for Multi-Field Document Information Extraction
Joint Extraction Matters: Prompt-Based Visual Question Answering for Multi-Field Document Information Extraction
Mengsay Loem
Taiju Hosaka
37
0
0
21 Mar 2025
PP-DocLayout: A Unified Document Layout Detection Model to Accelerate Large-Scale Data Construction
PP-DocLayout: A Unified Document Layout Detection Model to Accelerate Large-Scale Data Construction
Ting Sun
Cheng Cui
Yuning Du
Yi Liu
56
1
0
21 Mar 2025
M3: 3D-Spatial MultiModal Memory
M3: 3D-Spatial MultiModal Memory
Xueyan Zou
Yuchen Song
Ri-Zhao Qiu
Xuanbin Peng
Jianglong Ye
Sifei Liu
Xiaolong Wang
3DGS
62
0
0
20 Mar 2025
TextBite: A Historical Czech Document Dataset for Logical Page Segmentation
TextBite: A Historical Czech Document Dataset for Logical Page Segmentation
Martin Kostelník
Karel Beneš
Michal Hradiš
42
0
0
20 Mar 2025
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion
A. Nassar
Andres Marafioti
Matteo Omenetti
Maksym Lysak
Nikolaos Livathinos
...
Yusik Kim
A. Said Gurbuz
Michele Dolfi
Miquel Farré
Peter W. J. Staar
61
4
0
14 Mar 2025
KIEval: Evaluation Metric for Document Key Information Extraction
KIEval: Evaluation Metric for Document Key Information Extraction
Minsoo Khang
Sang Chul Jung
Sungrae Park
Teakgyu Hong
55
0
0
07 Mar 2025
A Token-level Text Image Foundation Model for Document Understanding
A Token-level Text Image Foundation Model for Document Understanding
Tongkun Guan
Zining Wang
Pei Fu
Zhengtao Guo
Wei-Ming Shen
...
Chen Duan
Hao Sun
Qianyi Jiang
Junfeng Luo
Xiaokang Yang
VLM
52
1
0
04 Mar 2025
Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document Transcription
Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document Transcription
Benjamin Gutteridge
Matthew Thomas Jackson
Toni Kukurin
Xiaowen Dong
36
0
0
27 Feb 2025
LiGT: Layout-infused Generative Transformer for Visual Question Answering on Vietnamese Receipts
LiGT: Layout-infused Generative Transformer for Visual Question Answering on Vietnamese Receipts
Thanh-Phong Le
Trung Le Chi Phan
Nghia Hieu Nguyen
Kiet Van Nguyen
ViT
49
0
0
26 Feb 2025
Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs
Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs
Gaye Colakoglu
Gürkan Solmaz
Jonathan Fürst
53
1
0
25 Feb 2025
Visual Reasoning Evaluation of Grok, Deepseek Janus, Gemini, Qwen, Mistral, and ChatGPT
Visual Reasoning Evaluation of Grok, Deepseek Janus, Gemini, Qwen, Mistral, and ChatGPT
Nidhal Jegham
Marwan Abdelatti
Abdeltawab Hendawi
VLM
LRM
60
1
0
23 Feb 2025
EDocNet: Efficient Datasheet Layout Analysis Based on Focus and Global Knowledge Distillation
EDocNet: Efficient Datasheet Layout Analysis Based on Focus and Global Knowledge Distillation
Hong Cai Chen
Longchang Wu
Yang Zhang
38
0
0
23 Feb 2025
Multimodal Large Language Models for Text-rich Image Understanding: A Comprehensive Review
Multimodal Large Language Models for Text-rich Image Understanding: A Comprehensive Review
Pei Fu
Tongkun Guan
Zining Wang
Zhentao Guo
Chen Duan
...
Boming Chen
Jiayao Ma
Qianyi Jiang
Kai Zhou
Junfeng Luo
VLM
66
0
0
23 Feb 2025
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models
Wenwen Yu
Zhibo Yang
Jianqiang Wan
Sibo Song
J. Tang
Wenqing Cheng
Yunxing Liu
Xiang Bai
58
3
0
22 Feb 2025
Handwritten Text Recognition: A Survey
Handwritten Text Recognition: A Survey
Carlos Garrido-Munoz
Antonio Ríos-Vila
Jorge Calvo-Zaragoza
106
0
0
12 Feb 2025
PatentLMM: Large Multimodal Model for Generating Descriptions for Patent Figures
Shivalika Singh
Nakul Sharma
Manish Gupta
Anand Mishra
55
1
0
28 Jan 2025
Spatial Information Integration in Small Language Models for Document Layout Generation and Classification
Spatial Information Integration in Small Language Models for Document Layout Generation and Classification
Pablo Melendez
Clemens Havas
36
0
0
09 Jan 2025
SAIL: Sample-Centric In-Context Learning for Document Information
  Extraction
SAIL: Sample-Centric In-Context Learning for Document Information Extraction
Jinyu Zhang
Zhiyuan You
Jize Wang
Xinyi Le
80
1
0
22 Dec 2024
Training LayoutLM from Scratch for Efficient Named-Entity Recognition in
  the Insurance Domain
Training LayoutLM from Scratch for Efficient Named-Entity Recognition in the Insurance Domain
Benno Uthayasooriyar
A. Ly
Franck Vermet
Caio Corro
74
0
0
12 Dec 2024
AutoGameUI: Constructing High-Fidelity Game UIs via Multimodal Learning
  and Interactive Web-Based Tool
AutoGameUI: Constructing High-Fidelity Game UIs via Multimodal Learning and Interactive Web-Based Tool
Zhongliang Tang
Mengchen Tan
Fei Xia
Qingrong Cheng
Hao Jiang
Yuyao Zhang
36
0
0
06 Nov 2024
HIP: Hierarchical Point Modeling and Pre-training for Visual Information
  Extraction
HIP: Hierarchical Point Modeling and Pre-training for Visual Information Extraction
Rujiao Long
Pengfei Wang
Zhibo Yang
Cong Yao
46
0
0
02 Nov 2024
Driving by the Rules: A Benchmark for Integrating Traffic Sign Regulations into Vectorized HD Map
Driving by the Rules: A Benchmark for Integrating Traffic Sign Regulations into Vectorized HD Map
Xinyuan Chang
Maixuan Xue
Xinran Liu
Zheng Pan
Xing Wei
62
1
0
31 Oct 2024
RealCQA-V2 : Visual Premise Proving A Manual COT Dataset for Charts
RealCQA-V2 : Visual Premise Proving A Manual COT Dataset for Charts
Saleem Ahmed
Ranga Setlur
Venu Govindaraju
ReLM
LRM
25
0
0
29 Oct 2024
MatViX: Multimodal Information Extraction from Visually Rich Articles
MatViX: Multimodal Information Extraction from Visually Rich Articles
Ghazal Khalighinejad
Sharon Scott
Ollie Liu
Kelly L. Anderson
Rickard Stureborg
Aman Tyagi
Bhuwan Dhingra
33
1
0
27 Oct 2024
"What is the value of {templates}?" Rethinking Document Information
  Extraction Datasets for LLMs
"What is the value of {templates}?" Rethinking Document Information Extraction Datasets for LLMs
Ran Zmigrod
Pranav Shetty
Mathieu Sibue
Zhiqiang Ma
Armineh Nourbakhsh
Xiaomo Liu
Manuela Veloso
28
0
0
20 Oct 2024
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse
  Synthetic Data and Global-to-Local Adaptive Perception
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
Zhiyuan Zhao
Hengrui Kang
Bin Wang
Zeang Sheng
35
11
0
16 Oct 2024
ReLayout: Towards Real-World Document Understanding via Layout-enhanced
  Pre-training
ReLayout: Towards Real-World Document Understanding via Layout-enhanced Pre-training
Zhouqiang Jiang
Bowen Wang
Junhao Chen
Yuta Nakashima
30
2
0
14 Oct 2024
TextLap: Customizing Language Models for Text-to-Layout Planning
TextLap: Customizing Language Models for Text-to-Layout Planning
Jian Chen
Ruiyi Zhang
Yufan Zhou
Jennifer Healey
J. Gu
Zhiqiang Xu
Chong Chen
VLM
44
3
0
09 Oct 2024
DAViD: Domain Adaptive Visually-Rich Document Understanding with
  Synthetic Insights
DAViD: Domain Adaptive Visually-Rich Document Understanding with Synthetic Insights
Yihao Ding
S. Han
Zechuan Li
Hyunsuk Chung
28
0
0
02 Oct 2024
GraphRevisedIE: Multimodal Information Extraction with Graph-Revised
  Network
GraphRevisedIE: Multimodal Information Extraction with Graph-Revised Network
Panfeng Cao
Jian Wu
30
9
0
02 Oct 2024
See then Tell: Enhancing Key Information Extraction with Vision
  Grounding
See then Tell: Enhancing Key Information Extraction with Vision Grounding
Shuhang Liu
Zhenrong Zhang
Pengfei Hu
Jiefeng Ma
Jun Du
Qing Wang
Jianshu Zhang
Chenyu Liu
26
0
0
29 Sep 2024
A comprehensive study of on-device NLP applications -- VQA, automated
  Form filling, Smart Replies for Linguistic Codeswitching
A comprehensive study of on-device NLP applications -- VQA, automated Form filling, Smart Replies for Linguistic Codeswitching
Naman Goyal
26
0
0
23 Sep 2024
DocMamba: Efficient Document Pre-training with State Space Model
DocMamba: Efficient Document Pre-training with State Space Model
Pengfei Hu
Zhenrong Zhang
Jiefeng Ma
Shuhang Liu
Jun Du
Jianshu Zhang
Mamba
47
1
0
18 Sep 2024
Leveraging Distillation Techniques for Document Understanding: A Case
  Study with FLAN-T5
Leveraging Distillation Techniques for Document Understanding: A Case Study with FLAN-T5
Marcel Lamott
Muhammad Armaghan Shakir
38
0
0
17 Sep 2024
RexUniNLU: Recursive Method with Explicit Schema Instructor for
  Universal NLU
RexUniNLU: Recursive Method with Explicit Schema Instructor for Universal NLU
Chengyuan Liu
Shihang Wang
Fubang Zhao
Kun Kuang
Yangyang Kang
Weiming Lu
Changlong Sun
Fei Wu
35
0
0
09 Sep 2024
READoc: A Unified Benchmark for Realistic Document Structured Extraction
READoc: A Unified Benchmark for Realistic Document Structured Extraction
Zichao Li
Aizier Abulaiti
Yaojie Lu
Xuanang Chen
Jia Zheng
Hongyu Lin
Xianpei Han
Le Sun
41
5
0
08 Sep 2024
12345678
Next