Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2202.13669
Cited By
LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding
28 February 2022
Jiapeng Wang
Lianwen Jin
Kai Ding
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding"
20 / 20 papers shown
Title
KIEval: Evaluation Metric for Document Key Information Extraction
Minsoo Khang
Sang Chul Jung
Sungrae Park
Teakgyu Hong
47
0
0
07 Mar 2025
LiGT: Layout-infused Generative Transformer for Visual Question Answering on Vietnamese Receipts
Thanh-Phong Le
Trung Le Chi Phan
Nghia Hieu Nguyen
Kiet Van Nguyen
ViT
44
0
0
26 Feb 2025
DocMamba: Efficient Document Pre-training with State Space Model
Pengfei Hu
Zhenrong Zhang
Jiefeng Ma
Shuhang Liu
Jun Du
Jianshu Zhang
Mamba
39
1
0
18 Sep 2024
PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based Visual Question Answering
Yihao Ding
Kaixuan Ren
Jiabin Huang
Siwen Luo
S. Han
40
1
0
19 Apr 2024
Noise-Aware Training of Layout-Aware Language Models
Ritesh Sarkhel
Xiaoqi Ren
Lauro Beltrao Costa
Guolong Su
Vincent Perot
Yanan Xie
Emmanouil Koukoumidis
Arnab Nandi
VLM
44
0
0
30 Mar 2024
TextBlockV2: Towards Precise-Detection-Free Scene Text Spotting with Pre-trained Language Model
Jiahao Lyu
Jin Wei
Gangyan Zeng
Zeng Li
Enze Xie
Wei Wang
Yu Zhou
VLM
29
3
0
15 Mar 2024
A Multi-Modal Multilingual Benchmark for Document Image Classification
Yoshinari Fujinuma
Siddharth Varia
Nishant Sankaran
Srikar Appalaraju
Bonan Min
Yogarshi Vyas
VLM
18
4
0
25 Oct 2023
On Evaluation of Document Classification using RVL-CDIP
Stefan Larson
Gordon Lim
Kevin Leach
26
3
0
21 Jun 2023
Multimodal Web Navigation with Instruction-Finetuned Foundation Models
Hiroki Furuta
Kuang-Huei Lee
Ofir Nachum
Yutaka Matsuo
Aleksandra Faust
S. Gu
Izzeddin Gur
LM&Ro
36
91
0
19 May 2023
Language Independent Neuro-Symbolic Semantic Parsing for Form Understanding
Bhanu Prakash Voutharoja
Lizhen Qu
Fatemeh Shiri
22
1
0
08 May 2023
Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA
Yongxin Zhu
Z. Liu
Yukang Liang
Xin Li
Hao Liu
Changcun Bao
Linli Xu
21
6
0
04 Apr 2023
Modeling Entities as Semantic Points for Visual Information Extraction in the Wild
Zhibo Yang
Rujiao Long
Pengfei Wang
Sibo Song
Humen Zhong
Wenqing Cheng
X. Bai
Cong Yao
32
19
0
23 Mar 2023
Entry Separation using a Mixed Visual and Textual Language Model: Application to 19th century French Trade Directories
Bertrand Duménieu
Edwin Carlinet
N. Abadie
Joseph Chazalon
24
0
0
17 Feb 2023
Alignment-Enriched Tuning for Patch-Level Pre-trained Document Image Models
Lei Wang
Jian He
Xingdong Xu
Ning Liu
Hui-juan Liu
33
2
0
27 Nov 2022
Evaluating Out-of-Distribution Performance on Document Image Classifiers
Stefan Larson
Gordon Lim
Yutong Ai
David Kuang
Kevin Leach
OODD
OOD
34
18
0
14 Oct 2022
XDoc: Unified Pre-training for Cross-Format Document Understanding
Jingye Chen
Tengchao Lv
Lei Cui
Changrong Zhang
Furu Wei
48
13
0
06 Oct 2022
Knowing Where and What: Unified Word Block Pretraining for Document Understanding
Song Tao
Zijian Wang
Tiantian Fan
Canjie Luo
Can Huang
SSL
30
2
0
28 Jul 2022
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
Yang Xu
Yiheng Xu
Tengchao Lv
Lei Cui
Furu Wei
...
D. Florêncio
Cha Zhang
Wanxiang Che
Min Zhang
Lidong Zhou
ViT
MLLM
150
498
0
29 Dec 2020
FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents
Guillaume Jaume
H. K. Ekenel
Jean-Philippe Thiran
134
355
0
27 May 2019
Aggregated Residual Transformations for Deep Neural Networks
Saining Xie
Ross B. Girshick
Piotr Dollár
Z. Tu
Kaiming He
297
10,216
0
16 Nov 2016
1