ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.08836
  4. Cited By
LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich
  Document Understanding

LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding

18 April 2021
Yiheng Xu
Tengchao Lv
Lei Cui
Guoxin Wang
Yijuan Lu
D. Florêncio
Cha Zhang
Furu Wei
    MLLM
    VLM
ArXivPDFHTML

Papers citing "LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding"

50 / 77 papers shown
Title
XY-Cut++: Advanced Layout Ordering via Hierarchical Mask Mechanism on a Novel Benchmark
XY-Cut++: Advanced Layout Ordering via Hierarchical Mask Mechanism on a Novel Benchmark
Shuai Liu
Youmeng Li
Jizeng Wei
33
0
0
14 Apr 2025
Towards a Multimodal Document-grounded Conversational AI System for Education
Towards a Multimodal Document-grounded Conversational AI System for Education
Karan Taneja
Anjali Singh
Ashok K. Goel
27
0
0
04 Apr 2025
UniHDSA: A Unified Relation Prediction Approach for Hierarchical Document Structure Analysis
UniHDSA: A Unified Relation Prediction Approach for Hierarchical Document Structure Analysis
Jiawei Wang
Kai Hu
Qiang Huo
53
0
0
20 Mar 2025
Large Language Models are Powerful EHR Encoders
Large Language Models are Powerful EHR Encoders
S. Hegselmann
Georg von Arnim
Tillmann Rheude
Noel Kronenberg
David Sontag
Gerhard Hindricks
R. Eils
Benjamin Wild
LM&MA
49
1
0
24 Feb 2025
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models
Wenwen Yu
Zhibo Yang
Jianqiang Wan
Sibo Song
J. Tang
Wenqing Cheng
Y. Liu
Xiang Bai
46
1
0
22 Feb 2025
HIP: Hierarchical Point Modeling and Pre-training for Visual Information
  Extraction
HIP: Hierarchical Point Modeling and Pre-training for Visual Information Extraction
Rujiao Long
Pengfei Wang
Zhibo Yang
Cong Yao
34
0
0
02 Nov 2024
ReLayout: Towards Real-World Document Understanding via Layout-enhanced
  Pre-training
ReLayout: Towards Real-World Document Understanding via Layout-enhanced Pre-training
Zhouqiang Jiang
Bowen Wang
Junhao Chen
Yuta Nakashima
22
2
0
14 Oct 2024
Modeling Layout Reading Order as Ordering Relations for Visually-rich
  Document Understanding
Modeling Layout Reading Order as Ordering Relations for Visually-rich Document Understanding
Chong Zhang
Yi Tu
Yixi Zhao
Chenshu Yuan
Huan Chen
...
Mingxu Chai
Ya Guo
Huijia Zhu
Qi Zhang
Tao Gui
41
2
0
29 Sep 2024
The MERIT Dataset: Modelling and Efficiently Rendering Interpretable
  Transcripts
The MERIT Dataset: Modelling and Efficiently Rendering Interpretable Transcripts
I. de Rodrigo
A. Sanchez-Cuadrado
J. Boal
A. J. Lopez-Lopez
VLM
21
1
0
31 Aug 2024
Deep Learning based Visually Rich Document Content Understanding: A
  Survey
Deep Learning based Visually Rich Document Content Understanding: A Survey
Muhammad Ali
Jean Lee
Salman Khan
34
6
0
02 Aug 2024
UNER: A Unified Prediction Head for Named Entity Recognition in
  Visually-rich Documents
UNER: A Unified Prediction Head for Named Entity Recognition in Visually-rich Documents
Yi Tu
Chong Zhang
Ya Guo
Huan Chen
Jinyang Tang
Huijia Zhu
Qi Zhang
38
3
0
02 Aug 2024
VRDSynth: Synthesizing Programs for Multilingual Visually Rich Document
  Information Extraction
VRDSynth: Synthesizing Programs for Multilingual Visually Rich Document Information Extraction
Thanh-Dat Nguyen
Tung Do-Viet
Hung Nguyen-Duy
Tuan-Hai Luu
Hung Le
Bach Le
Patanamon
Thongtanunam
SyDa
28
1
0
09 Jul 2024
Large Language Models Understand Layout
Large Language Models Understand Layout
Weiming Li
Manni Duan
Dong An
Yan Shao
39
3
0
08 Jul 2024
DocParseNet: Advanced Semantic Segmentation and OCR Embeddings for
  Efficient Scanned Document Annotation
DocParseNet: Advanced Semantic Segmentation and OCR Embeddings for Efficient Scanned Document Annotation
Ahmad Mohammadshirazi
Ali Nosrati Firoozsalari
Mengxi Zhou
Dheeraj Kulshrestha
R. Ramnath
31
0
0
25 Jun 2024
GeoContrastNet: Contrastive Key-Value Edge Learning for
  Language-Agnostic Document Understanding
GeoContrastNet: Contrastive Key-Value Edge Learning for Language-Agnostic Document Understanding
Nil Biescas
Carlos Boned Riera
Josep Lladós
Sanket Biswas
42
1
0
06 May 2024
Towards Efficient Resume Understanding: A Multi-Granularity Multi-Modal
  Pre-Training Approach
Towards Efficient Resume Understanding: A Multi-Granularity Multi-Modal Pre-Training Approach
Feihu Jiang
Chuan Qin
Jingshuai Zhang
Kaichun Yao
Xi Chen
Dazhong Shen
Chen Zhu
Hengshu Zhu
Hui Xiong
34
6
0
13 Apr 2024
OmniParser: A Unified Framework for Text Spotting, Key Information
  Extraction and Table Recognition
OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition
Jianqiang Wan
Sibo Song
Wenwen Yu
Yuliang Liu
Wenqing Cheng
Fei Huang
Xiang Bai
Cong Yao
Zhibo Yang
37
26
0
28 Mar 2024
Port Forwarding Services Are Forwarding Security Risks
Port Forwarding Services Are Forwarding Security Risks
Haoyuan Wang
Yue Xue
Xuan Feng
Chao Zhou
Xianghang Mi
19
0
0
24 Mar 2024
Adversarial Training with OCR Modality Perturbation for Scene-Text
  Visual Question Answering
Adversarial Training with OCR Modality Perturbation for Scene-Text Visual Question Answering
Zhixuan Shen
Haonan Luo
Sijia Li
Tianrui Li
21
0
0
14 Mar 2024
Transformers and Language Models in Form Understanding: A Comprehensive
  Review of Scanned Document Analysis
Transformers and Language Models in Form Understanding: A Comprehensive Review of Scanned Document Analysis
Abdelrahman Abdallah
Daniel Eberharter
Zoe Pfister
Adam Jatowt
27
12
0
06 Mar 2024
It's Never Too Late: Fusing Acoustic Information into Large Language
  Models for Automatic Speech Recognition
It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition
Chen Chen
Ruizhe Li
Yuchen Hu
Sabato Marco Siniscalchi
Pin-Yu Chen
Ensiong Chng
Chao-Han Huck Yang
24
19
0
08 Feb 2024
TreeForm: End-to-end Annotation and Evaluation for Form Document Parsing
TreeForm: End-to-end Annotation and Evaluation for Form Document Parsing
Ran Zmigrod
Zhiqiang Ma
Armineh Nourbakhsh
Sameena Shah
19
4
0
07 Feb 2024
PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for
  End-to-end Document Pair Extraction
PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction
Zening Lin
Jiapeng Wang
Teng Li
Wenhui Liao
Dayi Huang
Longfei Xiong
Lianwen Jin
19
2
0
07 Jan 2024
DocPedia: Unleashing the Power of Large Multimodal Model in the
  Frequency Domain for Versatile Document Understanding
DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding
Hao Feng
Qi Liu
Hao Liu
Wen-gang Zhou
Houqiang Li
Can Huang
VLM
25
58
0
20 Nov 2023
DONUT-hole: DONUT Sparsification by Harnessing Knowledge and Optimizing
  Learning Efficiency
DONUT-hole: DONUT Sparsification by Harnessing Knowledge and Optimizing Learning Efficiency
Azhar Shaikh
Michael Cochez
Denis Diachkov
Michiel de Rijcke
Sahar Yousefi
25
0
0
09 Nov 2023
A Scalable Framework for Table of Contents Extraction from Complex ESG
  Annual Reports
A Scalable Framework for Table of Contents Extraction from Complex ESG Annual Reports
Xinyu Wang
Lin Gui
Yulan He
LMTD
13
2
0
27 Oct 2023
A Multi-Modal Multilingual Benchmark for Document Image Classification
A Multi-Modal Multilingual Benchmark for Document Image Classification
Yoshinari Fujinuma
Siddharth Varia
Nishant Sankaran
Srikar Appalaraju
Bonan Min
Yogarshi Vyas
VLM
18
4
0
25 Oct 2023
Reading Order Matters: Information Extraction from Visually-rich
  Documents by Token Path Prediction
Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction
Chong Zhang
Ya Guo
Yi Tu
Huan Chen
Jinyang Tang
Huijia Zhu
Qi Zhang
Tao Gui
3DV
26
20
0
17 Oct 2023
Kosmos-2.5: A Multimodal Literate Model
Kosmos-2.5: A Multimodal Literate Model
Tengchao Lv
Yupan Huang
Jingye Chen
Lei Cui
Shuming Ma
...
Weiyao Luo
Shaoxiang Wu
Guoxin Wang
Cha Zhang
Furu Wei
VLM
MLLM
23
63
0
20 Sep 2023
Attention Where It Matters: Rethinking Visual Document Understanding
  with Selective Region Concentration
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
H. Cao
Changcun Bao
Chaohu Liu
Huang-wei Chen
Kun Yin
Hao Liu
Yinsong Liu
Deqiang Jiang
Xing Sun
12
13
0
03 Sep 2023
Document AI: A Comparative Study of Transformer-Based, Graph-Based
  Models, and Convolutional Neural Networks For Document Layout Analysis
Document AI: A Comparative Study of Transformer-Based, Graph-Based Models, and Convolutional Neural Networks For Document Layout Analysis
Sotirios Kastanas
Shaomu Tan
Yijiang He
25
1
0
29 Aug 2023
DocPrompt: Large-scale continue pretrain for zero-shot and few-shot
  document question answering
DocPrompt: Large-scale continue pretrain for zero-shot and few-shot document question answering
Sijin Wu
Dan Zhang
Teng Hu
Shikun Feng
16
1
0
21 Aug 2023
Enhancing Visually-Rich Document Understanding via Layout Structure
  Modeling
Enhancing Visually-Rich Document Understanding via Layout Structure Modeling
Qiwei Li
Z. Li
Xiantao Cai
Bo Du
Hai Zhao
22
7
0
15 Aug 2023
Multimodal Document Analytics for Banking Process Automation
Multimodal Document Analytics for Banking Process Automation
C. Gerling
Stefan Lessmann
22
3
0
21 Jul 2023
Multi-Method Self-Training: Improving Code Generation With Text, And
  Vice Versa
Multi-Method Self-Training: Improving Code Generation With Text, And Vice Versa
Shriyash Upadhyay
Etan Ginsberg
SyDa
LRM
19
0
0
20 Jul 2023
PPN: Parallel Pointer-based Network for Key Information Extraction with
  Complex Layouts
PPN: Parallel Pointer-based Network for Key Information Extraction with Complex Layouts
Kaiwen Wei
Jie Yao
Jingyuan Zhang
Yangyang Kang
Fubang Zhao
Yating Zhang
Changlong Sun
Xin Jin
Xin Zhang
8
4
0
20 Jul 2023
Do-GOOD: Towards Distribution Shift Evaluation for Pre-Trained Visual
  Document Understanding Models
Do-GOOD: Towards Distribution Shift Evaluation for Pre-Trained Visual Document Understanding Models
Jiabang He
Yilang Hu
Lei Wang
Xingdong Xu
Ning Liu
Hui-juan Liu
Hengtao Shen
VLM
OOD
22
2
0
05 Jun 2023
Layout and Task Aware Instruction Prompt for Zero-shot Document Image
  Question Answering
Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering
Wenjin Wang
Yunhao Li
Yixin Ou
Yin Zhang
VLM
16
24
0
01 Jun 2023
Global Structure Knowledge-Guided Relation Extraction Method for
  Visually-Rich Document
Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich Document
Xiangnan Chen
Qianwen Xiao
Juncheng Li
Duo Dong
Jun Lin
Xiaozhong Liu
Siliang Tang
32
5
0
23 May 2023
CCpdf: Building a High Quality Corpus for Visually Rich Documents from
  Web Crawl Data
CCpdf: Building a High Quality Corpus for Visually Rich Documents from Web Crawl Data
M. Turski
Tomasz Stanislawek
Karol Kaczmarek
Pawel Dyda
Filip Graliñski
25
10
0
28 Apr 2023
Information Redundancy and Biases in Public Document Information
  Extraction Benchmarks
Information Redundancy and Biases in Public Document Information Extraction Benchmarks
S. Laatiri
Pirashanth Ratnamogan
Joel Tang
Laurent Lam
William Vanhuffel
Fabien Caspani
20
1
0
28 Apr 2023
Structure Diagram Recognition in Financial Announcements
Structure Diagram Recognition in Financial Announcements
Meixuan Qiao
Jun Wang
Junfu Xiang
Qiyu Hou
Ruixuan Li
25
1
0
26 Apr 2023
GeoLayoutLM: Geometric Pre-training for Visual Information Extraction
GeoLayoutLM: Geometric Pre-training for Visual Information Extraction
Chuwei Luo
Changxu Cheng
Qi Zheng
Cong Yao
11
43
0
21 Apr 2023
Context-Aware Classification of Legal Document Pages
Context-Aware Classification of Legal Document Pages
Pavlos Fragkogiannis
Martina Forster
Grace E. Lee
Dell Zhang
19
5
0
05 Apr 2023
Modeling Entities as Semantic Points for Visual Information Extraction
  in the Wild
Modeling Entities as Semantic Points for Visual Information Extraction in the Wild
Zhibo Yang
Rujiao Long
Pengfei Wang
Sibo Song
Humen Zhong
Wenqing Cheng
X. Bai
Cong Yao
19
19
0
23 Mar 2023
StrucTexTv2: Masked Visual-Textual Prediction for Document Image
  Pre-training
StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training
Yu Yu
Yulin Li
Chengquan Zhang
Xiaoqiang Zhang
Zengyuan Guo
Xiameng Qin
Kun Yao
Junyu Han
Errui Ding
Jingdong Wang
8
45
0
01 Mar 2023
Entry Separation using a Mixed Visual and Textual Language Model:
  Application to 19th century French Trade Directories
Entry Separation using a Mixed Visual and Textual Language Model: Application to 19th century French Trade Directories
Bertrand Duménieu
Edwin Carlinet
N. Abadie
Joseph Chazalon
19
0
0
17 Feb 2023
DocILE Benchmark for Document Information Localization and Extraction
DocILE Benchmark for Document Information Localization and Extraction
vStvepán vSimsa
Milan vSulc
Michal Uvrivcávr
Yash J. Patel
Ahmed Hamdi
...
Matyávs Skalický
Jivrí Matas
Antoine Doucet
Mickael Coustaty
Dimosthenis Karatzas
19
33
0
11 Feb 2023
DocILE 2023 Teaser: Document Information Localization and Extraction
DocILE 2023 Teaser: Document Information Localization and Extraction
vStvepán vSimsa
Milan vSulc
Matyávs Skalický
Yash J. Patel
Ahmed Hamdi
23
2
0
29 Jan 2023
Wukong-Reader: Multi-modal Pre-training for Fine-grained Visual Document
  Understanding
Wukong-Reader: Multi-modal Pre-training for Fine-grained Visual Document Understanding
Haoli Bai
Zhiguang Liu
Xiaojun Meng
Wentao Li
Shuangning Liu
...
Liangwei Wang
Lu Hou
Jiansheng Wei
Xin Jiang
Qun Liu
ViT
22
11
0
19 Dec 2022
12
Next