Kleister: Key Information Extraction Datasets Involving Long Documents
with Complex Layouts

Kleister: Key Information Extraction Datasets Involving Long Documents with Complex Layouts

12 May 2021

Tomasz Stanislawek

Filip Graliñski

Anna Wróblewska

Dawid Lipiñski

Agnieszka Kaliska

Paulina Rosalska

Bartosz Topolski

Papers citing "Kleister: Key Information Extraction Datasets Involving Long Documents with Complex Layouts"

18 / 68 papers shown

Title
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding Jiabo Ye Anwen Hu Haiyang Xu Qinghao Ye Mingshi Yan ... Chenliang Li Junfeng Tian Qiang Qi Ji Zhang Feiyan Huang VLM MLLM 27 118 0 04 Jul 2023
DocumentNet: Bridging the Data Gap in Document Pre-Training Lijun Yu Jin Miao Xiaoyu Sun Jiayi Chen Alexander G. Hauptmann H. Dai Wei Wei 24 3 0 15 Jun 2023
Document Understanding Dataset and Evaluation (DUDE) Jordy Van Landeghem Rubèn Pérez Tito Łukasz Borchmann Michal Pietruszka Pawel Józiak ... Bertrand Ackaert Ernest Valveny Matthew Blaschko Sien Moens Tomasz Stanislawek VGen 29 53 0 15 May 2023
SelfDocSeg: A Self-Supervised vision-based Approach towards Document Segmentation Subhajit Maity Sanket Biswas Siladittya Manna Ayan Banerjee Josep Lladós Saumik Bhattacharya Umapada Pal 41 5 0 01 May 2023
Information Redundancy and Biases in Public Document Information Extraction Benchmarks S. Laatiri Pirashanth Ratnamogan Joel Tang Laurent Lam William Vanhuffel Fabien Caspani 33 1 0 28 Apr 2023
Information Extraction from Documents: Question Answering vs Token Classification in real-world setups Laurent Lam Pirashanth Ratnamogan Joel Tang William Vanhuffel Fabien Caspani 26 0 0 21 Apr 2023
DocILE Benchmark for Document Information Localization and Extraction vStvepán vSimsa Milan vSulc Michal Uvrivcávr Yash J. Patel Ahmed Hamdi ... Matyávs Skalický Jivrí Matas Antoine Doucet Mickael Coustaty Dimosthenis Karatzas 24 34 0 11 Feb 2023
DocILE 2023 Teaser: Document Information Localization and Extraction vStvepán vSimsa Milan vSulc Matyávs Skalický Yash J. Patel Ahmed Hamdi 31 2 0 29 Jan 2023
An Augmentation Strategy for Visually Rich Documents Jing Xie James Bradley Wendt Yichao Zhou Seth Ebner Sandeep Tata 21 0 0 20 Dec 2022
Unifying Vision, Text, and Layout for Universal Document Processing Zineng Tang Ziyi Yang Guoxin Wang Yuwei Fang Yang Liu Chenguang Zhu Michael Zeng Chao-Yue Zhang Joey Tianyi Zhou VLM 32 107 0 05 Dec 2022
VRDU: A Benchmark for Visually-rich Document Understanding Zilong Wang Yichao Zhou Wei Wei Chen-Yu Lee Sandeep Tata 32 15 0 15 Nov 2022
Understanding Long Documents with Different Position-Aware Attentions Hai Pham Guoxin Wang Yijuan Lu D. Florêncio Changrong Zhang 27 9 0 17 Aug 2022
Information Extraction from Scanned Invoice Images using Text Analysis and Layout Features H. Ha Ales Horak 27 14 0 08 Aug 2022
Business Document Information Extraction: Towards Practical Benchmarks Matyás Skalický Stepán Simsa Michal Uřičář Milan Šulc 33 9 0 20 Jun 2022
Document AI: Benchmarks, Models and Applications Lei Cui Yiheng Xu Tengchao Lv Furu Wei VLM 29 70 0 16 Nov 2021
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer Rafal Powalski Łukasz Borchmann Dawid Jurkiewicz Tomasz Dwojak Michal Pietruszka Gabriela Pałka ViT 36 157 0 18 Feb 2021
From Dataset Recycling to Multi-Property Extraction and Beyond Tomasz Dwojak Michal Pietruszka Łukasz Borchmann Jakub Chlkedowski Filip Graliñski 50 5 0 06 Nov 2020
FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents Guillaume Jaume H. K. Ekenel Jean-Philippe Thiran 143 357 0 27 May 2019