ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.06947
  4. Cited By
XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich
  Document Understanding

XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding

14 March 2022
Zhangxuan Gu
Changhua Meng
Ke Wang
Jun Lan
Weiqiang Wang
Ming Gu
Liqing Zhang
ArXivPDFHTML

Papers citing "XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding"

50 / 56 papers shown
Title
Representation Learning for Tabular Data: A Comprehensive Survey
Representation Learning for Tabular Data: A Comprehensive Survey
Jun-Peng Jiang
Si-Yang Liu
Hao-Run Cai
Qile Zhou
Han-Jia Ye
LMTD
46
0
0
17 Apr 2025
XY-Cut++: Advanced Layout Ordering via Hierarchical Mask Mechanism on a Novel Benchmark
XY-Cut++: Advanced Layout Ordering via Hierarchical Mask Mechanism on a Novel Benchmark
Shuai Liu
Youmeng Li
Jizeng Wei
33
0
0
14 Apr 2025
A Simple yet Effective Layout Token in Large Language Models for Document Understanding
A Simple yet Effective Layout Token in Large Language Models for Document Understanding
Zhaoqing Zhu
Chuwei Luo
Zirui Shao
Feiyu Gao
Hangdi Xing
Qi Zheng
Ji Zhang
50
0
0
24 Mar 2025
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models
Wenwen Yu
Zhibo Yang
Jianqiang Wan
Sibo Song
J. Tang
Wenqing Cheng
Y. Liu
Xiang Bai
46
1
0
22 Feb 2025
SAIL: Sample-Centric In-Context Learning for Document Information
  Extraction
SAIL: Sample-Centric In-Context Learning for Document Information Extraction
Jinyu Zhang
Zhiyuan You
Jize Wang
Xinyi Le
69
1
0
22 Dec 2024
HIP: Hierarchical Point Modeling and Pre-training for Visual Information
  Extraction
HIP: Hierarchical Point Modeling and Pre-training for Visual Information Extraction
Rujiao Long
Pengfei Wang
Zhibo Yang
Cong Yao
39
0
0
02 Nov 2024
ReLayout: Towards Real-World Document Understanding via Layout-enhanced
  Pre-training
ReLayout: Towards Real-World Document Understanding via Layout-enhanced Pre-training
Zhouqiang Jiang
Bowen Wang
Junhao Chen
Yuta Nakashima
22
2
0
14 Oct 2024
Modeling Layout Reading Order as Ordering Relations for Visually-rich
  Document Understanding
Modeling Layout Reading Order as Ordering Relations for Visually-rich Document Understanding
Chong Zhang
Yi Tu
Yixi Zhao
Chenshu Yuan
Huan Chen
...
Mingxu Chai
Ya Guo
Huijia Zhu
Qi Zhang
Tao Gui
41
2
0
29 Sep 2024
DocMamba: Efficient Document Pre-training with State Space Model
DocMamba: Efficient Document Pre-training with State Space Model
Pengfei Hu
Zhenrong Zhang
Jiefeng Ma
Shuhang Liu
Jun Du
Jianshu Zhang
Mamba
37
1
0
18 Sep 2024
The MERIT Dataset: Modelling and Efficiently Rendering Interpretable
  Transcripts
The MERIT Dataset: Modelling and Efficiently Rendering Interpretable Transcripts
I. de Rodrigo
A. Sanchez-Cuadrado
J. Boal
A. J. Lopez-Lopez
VLM
21
1
0
31 Aug 2024
Deep Learning based Visually Rich Document Content Understanding: A
  Survey
Deep Learning based Visually Rich Document Content Understanding: A Survey
Muhammad Ali
Jean Lee
Salman Khan
39
6
0
02 Aug 2024
UNER: A Unified Prediction Head for Named Entity Recognition in
  Visually-rich Documents
UNER: A Unified Prediction Head for Named Entity Recognition in Visually-rich Documents
Yi Tu
Chong Zhang
Ya Guo
Huan Chen
Jinyang Tang
Huijia Zhu
Qi Zhang
43
3
0
02 Aug 2024
Hierarchical Multi-modal Transformer for Cross-modal Long Document
  Classification
Hierarchical Multi-modal Transformer for Cross-modal Long Document Classification
Tengfei Liu
Yongli Hu
Junbin Gao
Yanfeng Sun
Baocai Yin
26
0
0
14 Jul 2024
DocKylin: A Large Multimodal Model for Visual Document Understanding
  with Efficient Visual Slimming
DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming
Jiaxin Zhang
Wentao Yang
Songxuan Lai
Zecheng Xie
Lianwen Jin
34
15
0
27 Jun 2024
E-ANT: A Large-Scale Dataset for Efficient Automatic GUI NavigaTion
E-ANT: A Large-Scale Dataset for Efficient Automatic GUI NavigaTion
Ke Wang
Tianyu Xia
Zhangxuan Gu
Yi Zhao
Shuheng Shen
Changhua Meng
Weiqiang Wang
Ke Xu
31
0
0
20 Jun 2024
A Hybrid Approach for Document Layout Analysis in Document images
A Hybrid Approach for Document Layout Analysis in Document images
Tahira Shehzadi
Didier Stricker
Muhammad Zeshan Afzal
29
5
0
27 Apr 2024
HRVDA: High-Resolution Visual Document Assistant
HRVDA: High-Resolution Visual Document Assistant
Chaohu Liu
Kun Yin
Haoyu Cao
Xinghua Jiang
Xin Li
Yinsong Liu
Deqiang Jiang
Xing Sun
Linli Xu
VLM
35
23
0
10 Apr 2024
LayoutLLM: Layout Instruction Tuning with Large Language Models for
  Document Understanding
LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding
Chuwei Luo
Yufan Shen
Zhaoqing Zhu
Qi Zheng
Zhi Yu
Cong Yao
29
38
0
08 Apr 2024
OmniParser: A Unified Framework for Text Spotting, Key Information
  Extraction and Table Recognition
OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition
Jianqiang Wan
Sibo Song
Wenwen Yu
Yuliang Liu
Wenqing Cheng
Fei Huang
Xiang Bai
Cong Yao
Zhibo Yang
43
26
0
28 Mar 2024
Detect-Order-Construct: A Tree Construction based Approach for
  Hierarchical Document Structure Analysis
Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure Analysis
Jiawei Wang
Kai Hu
Zhuoyao Zhong
Lei-huan Sun
Qiang Huo
25
6
0
22 Jan 2024
PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for
  End-to-end Document Pair Extraction
PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction
Zening Lin
Jiapeng Wang
Teng Li
Wenhui Liao
Dayi Huang
Longfei Xiong
Lianwen Jin
19
2
0
07 Jan 2024
A Scalable Framework for Table of Contents Extraction from Complex ESG
  Annual Reports
A Scalable Framework for Table of Contents Extraction from Complex ESG Annual Reports
Xinyu Wang
Lin Gui
Yulan He
LMTD
18
2
0
27 Oct 2023
DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye
  Movement for Machine Reading
DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading
Hao Wang
Qingxuan Wang
Yue Li
Changqing Wang
Chenhui Chu
Rui-cang Wang
VGen
21
3
0
23 Oct 2023
Vision-Enhanced Semantic Entity Recognition in Document Images via
  Visually-Asymmetric Consistency Learning
Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning
Hao Wang
Xiahua Chen
Rui-cang Wang
Chenhui Chu
19
0
0
23 Oct 2023
Reading Order Matters: Information Extraction from Visually-rich
  Documents by Token Path Prediction
Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction
Chong Zhang
Ya Guo
Yi Tu
Huan Chen
Jinyang Tang
Huijia Zhu
Qi Zhang
Tao Gui
3DV
26
20
0
17 Oct 2023
Kosmos-2.5: A Multimodal Literate Model
Kosmos-2.5: A Multimodal Literate Model
Tengchao Lv
Yupan Huang
Jingye Chen
Lei Cui
Shuming Ma
...
Weiyao Luo
Shaoxiang Wu
Guoxin Wang
Cha Zhang
Furu Wei
VLM
MLLM
23
63
0
20 Sep 2023
Enhancing Visually-Rich Document Understanding via Layout Structure
  Modeling
Enhancing Visually-Rich Document Understanding via Layout Structure Modeling
Qiwei Li
Z. Li
Xiantao Cai
Bo Du
Hai Zhao
28
7
0
15 Aug 2023
Do-GOOD: Towards Distribution Shift Evaluation for Pre-Trained Visual
  Document Understanding Models
Do-GOOD: Towards Distribution Shift Evaluation for Pre-Trained Visual Document Understanding Models
Jiabang He
Yilang Hu
Lei Wang
Xingdong Xu
Ning Liu
Hui-juan Liu
Hengtao Shen
VLM
OOD
22
2
0
05 Jun 2023
DocFormerv2: Local Features for Document Understanding
DocFormerv2: Local Features for Document Understanding
Srikar Appalaraju
Peng Tang
Qi Dong
Nishant Sankaran
Yichu Zhou
R. Manmatha
22
39
0
02 Jun 2023
Layout and Task Aware Instruction Prompt for Zero-shot Document Image
  Question Answering
Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering
Wenjin Wang
Yunhao Li
Yixin Ou
Yin Zhang
VLM
21
24
0
01 Jun 2023
LayoutMask: Enhance Text-Layout Interaction in Multi-modal Pre-training
  for Document Understanding
LayoutMask: Enhance Text-Layout Interaction in Multi-modal Pre-training for Document Understanding
Yi Tu
Ya Guo
Huan Chen
Jinyang Tang
29
15
0
30 May 2023
Global Structure Knowledge-Guided Relation Extraction Method for
  Visually-Rich Document
Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich Document
Xiangnan Chen
Qianwen Xiao
Juncheng Li
Duo Dong
Jun Lin
Xiaozhong Liu
Siliang Tang
32
5
0
23 May 2023
Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided
  Dynamic Token Merge for Document Understanding
Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided Dynamic Token Merge for Document Understanding
Mingliang Zhai
Yulin Li
Xiameng Qin
Chen Yi
Qunyi Xie
Chengquan Zhang
Kun Yao
Yuwei Wu
Yunde Jia
13
8
0
19 May 2023
A Review of Data-driven Approaches for Malicious Website Detection
A Review of Data-driven Approaches for Malicious Website Detection
Zeyuan Hu
Ziang Yuan
AAML
16
1
0
16 May 2023
SwinDocSegmenter: An End-to-End Unified Domain Adaptive Transformer for
  Document Instance Segmentation
SwinDocSegmenter: An End-to-End Unified Domain Adaptive Transformer for Document Instance Segmentation
Ayan Banerjee
Sanket Biswas
Josep Lladós
Umapada Pal
ViT
12
16
0
08 May 2023
Text Reading Order in Uncontrolled Conditions by Sparse Graph
  Segmentation
Text Reading Order in Uncontrolled Conditions by Sparse Graph Segmentation
Renshen Wang
Yasuhisa Fujii
Alessandro Bissacco
GNN
19
6
0
04 May 2023
FormNetV2: Multimodal Graph Contrastive Learning for Form Document
  Information Extraction
FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction
Nils Loose
Chun-Liang Li
Hao Zhang
Timothy Dozat
Felix Mächtle
...
Shangbang Long
Siyang Qin
Yasuhisa Fujii
Nan Hua
T. Eisenbarth
SSL
45
17
0
04 May 2023
GeoLayoutLM: Geometric Pre-training for Visual Information Extraction
GeoLayoutLM: Geometric Pre-training for Visual Information Extraction
Chuwei Luo
Changxu Cheng
Qi Zheng
Cong Yao
13
43
0
21 Apr 2023
A Question-Answering Approach to Key Value Pair Extraction from
  Form-like Document Images
A Question-Answering Approach to Key Value Pair Extraction from Form-like Document Images
Kai Hu
Zhuoyuan Wu
Zhuoyao Zhong
Weihong Lin
Lei-huan Sun
Qiang Huo
12
10
0
17 Apr 2023
Modeling Entities as Semantic Points for Visual Information Extraction
  in the Wild
Modeling Entities as Semantic Points for Visual Information Extraction in the Wild
Zhibo Yang
Rujiao Long
Pengfei Wang
Sibo Song
Humen Zhong
Wenqing Cheng
X. Bai
Cong Yao
27
19
0
23 Mar 2023
ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for
  Document Information Extraction
ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction
Jiabang He
Lei Wang
Yingpeng Hu
Ning Liu
Hui-juan Liu
Xingdong Xu
Hengtao Shen
MLLM
6
47
0
09 Mar 2023
Entry Separation using a Mixed Visual and Textual Language Model:
  Application to 19th century French Trade Directories
Entry Separation using a Mixed Visual and Textual Language Model: Application to 19th century French Trade Directories
Bertrand Duménieu
Edwin Carlinet
N. Abadie
Joseph Chazalon
19
0
0
17 Feb 2023
DocILE Benchmark for Document Information Localization and Extraction
DocILE Benchmark for Document Information Localization and Extraction
vStvepán vSimsa
Milan vSulc
Michal Uvrivcávr
Yash J. Patel
Ahmed Hamdi
...
Matyávs Skalický
Jivrí Matas
Antoine Doucet
Mickael Coustaty
Dimosthenis Karatzas
24
33
0
11 Feb 2023
Multimodality Representation Learning: A Survey on Evolution,
  Pretraining and Its Applications
Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications
Muhammad Arslan Manzoor
S. Albarri
Ziting Xian
Zaiqiao Meng
Preslav Nakov
Shangsong Liang
AI4TS
21
26
0
01 Feb 2023
Wukong-Reader: Multi-modal Pre-training for Fine-grained Visual Document
  Understanding
Wukong-Reader: Multi-modal Pre-training for Fine-grained Visual Document Understanding
Haoli Bai
Zhiguang Liu
Xiaojun Meng
Wentao Li
Shuangning Liu
...
Liangwei Wang
Lu Hou
Jiansheng Wei
Xin Jiang
Qun Liu
ViT
22
11
0
19 Dec 2022
Unifying Vision, Text, and Layout for Universal Document Processing
Unifying Vision, Text, and Layout for Universal Document Processing
Zineng Tang
Ziyi Yang
Guoxin Wang
Yuwei Fang
Yang Liu
Chenguang Zhu
Michael Zeng
Chao-Yue Zhang
Mohit Bansal
VLM
30
105
0
05 Dec 2022
Alignment-Enriched Tuning for Patch-Level Pre-trained Document Image
  Models
Alignment-Enriched Tuning for Patch-Level Pre-trained Document Image Models
Lei Wang
Jian He
Xingdong Xu
Ning Liu
Hui-juan Liu
31
2
0
27 Nov 2022
Unimodal and Multimodal Representation Training for Relation Extraction
Unimodal and Multimodal Representation Training for Relation Extraction
Ciaran Cooney
Rachel Heyburn
Liam Maddigan
Mairead O'Cuinn
Chloe Thompson
Joana Cavadas
20
2
0
11 Nov 2022
ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich
  Document Understanding
ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding
Qiming Peng
Yinxu Pan
Wenjin Wang
Bin Luo
Zhenyu Zhang
...
Shi Feng
Yu Sun
Hao Tian
Hua-Hong Wu
Haifeng Wang
8
83
0
12 Oct 2022
PP-StructureV2: A Stronger Document Analysis System
PP-StructureV2: A Stronger Document Analysis System
Chenxia Li
Ruoyu Guo
Jun Zhou
Mengtao An
Yuning Du
Lingfeng Zhu
Yi Liu
Xiaoguang Hu
Dianhai Yu
49
22
0
11 Oct 2022
12
Next