LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

18 April 2022

Papers citing "LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking"

45 / 45 papers shown

Title
Lost in OCR Translation? Vision-Based Approaches to Robust Document Retrieval Alexander Buschmann Most Joseph Winjum Ayan Biswas Shawn Jones Nishath Rajiv Ranasinghe Dan O’Malley Manish Bhattarai 16 0 0 08 May 2025
DocSpiral: A Platform for Integrated Assistive Document Annotation through Human-in-the-Spiral Qiang Sun Sirui Li Tingting Bi D. Huynh Mark Reynolds Yuanyi Luo Wei Liu 30 0 0 06 May 2025
QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-free Visual Document Understanding Binh M. Le Shaoyuan Xu Jinmiao Fu Zhishen Huang Moyan Li Yanhui Guo Hongdong Li Sameera Ramasinghe Bryan Wang 28 0 0 03 Apr 2025
UniHDSA: A Unified Relation Prediction Approach for Hierarchical Document Structure Analysis Jiawei Wang Kai Hu Qiang Huo 53 0 0 20 Mar 2025
KIEval: Evaluation Metric for Document Key Information Extraction Minsoo Khang Sang Chul Jung Sungrae Park Teakgyu Hong 47 0 0 07 Mar 2025
LiGT: Layout-infused Generative Transformer for Visual Question Answering on Vietnamese Receipts Thanh-Phong Le Trung Le Chi Phan Nghia Hieu Nguyen Kiet Van Nguyen ViT 44 0 0 26 Feb 2025
Reading the unreadable: Creating a dataset of 19th century English newspapers using image-to-text language models Jonathan Bourne 75 0 0 24 Feb 2025
Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence Granite Vision Team Leonid Karlinsky Assaf Arbelle Abraham Daniels A. Nassar ... Sriram Raghavan T. Syeda-Mahmood Peter W. J. Staar Tal Drory Rogerio Feris VLM AI4TS 102 0 0 14 Feb 2025
$\Éclair -- Extracting Content and Layout with Integrated Reading Order for Documents$ \Éclair -- Extracting Content and Layout with Integrated Reading Order for Documents Ilia Karmanov A. Deshmukh Lukas Voegtle Philipp Fischer Kateryna Chumachenko ... Jarno Seppänen Jupinder Parmar Joseph Jennings Andrew Tao Karan Sapra 68 0 0 06 Feb 2025
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations Linke Ouyang Yuan Qu Hongbin Zhou Jiawei Zhu Rui Zhang ... Chao Xu Bo Zhang Botian Shi Zhongying Tu Conghui He 96 5 0 10 Dec 2024
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding Jaemin Cho Debanjan Mahata Ozan Irsoy Yujie He Mohit Bansal VLM 20 8 0 07 Nov 2024
Driving by the Rules: A Benchmark for Integrating Traffic Sign Regulations into Vectorized HD Map Xinyuan Chang Maixuan Xue Xinran Liu Zheng Pan Xing Wei 40 1 0 31 Oct 2024
DocMamba: Efficient Document Pre-training with State Space Model Pengfei Hu Zhenrong Zhang Jiefeng Ma Shuhang Liu Jun Du Jianshu Zhang Mamba 35 1 0 18 Sep 2024
ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data Yufan Shen Chuwei Luo Zhaoqing Zhu Yang Chen Qi Zheng Zhi Yu Jiajun Bu Cong Yao 36 2 0 17 Jul 2024
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding Ofir Abramovich Niv Nayman Sharon Fogel I. Lavi Ron Litman Shahar Tsiper Royee Tichauer Srikar Appalaraju Shai Mazor R. Manmatha VLM 33 3 0 17 Jul 2024
ColPali: Efficient Document Retrieval with Vision Language Models Manuel Faysse Hugues Sibille Tony Wu Bilel Omrani Gautier Viaud C´eline Hudelot Pierre Colombo VLM 60 21 0 27 Jun 2024
DocParseNet: Advanced Semantic Segmentation and OCR Embeddings for Efficient Scanned Document Annotation Ahmad Mohammadshirazi Ali Nosrati Firoozsalari Mengxi Zhou Dheeraj Kulshrestha R. Ramnath 26 0 0 25 Jun 2024
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications Jordy Van Landeghem Subhajit Maity Ayan Banerjee Matthew Blaschko Marie-Francine Moens Josep Lladós Sanket Biswas 41 2 0 12 Jun 2024
Reconstructing training data from document understanding models Jérémie Dentan Arnaud Paran A. Shabou AAML SyDa 34 1 0 05 Jun 2024
Improve Academic Query Resolution through BERT-based Question Extraction from Images Nidhi Kamal Saurabh Yadav Jorawar Singh Aditi Avasthi 16 0 0 28 Apr 2024
A Hybrid Approach for Document Layout Analysis in Document images Tahira Shehzadi Didier Stricker Muhammad Zeshan Afzal 29 5 0 27 Apr 2024
PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based Visual Question Answering Yihao Ding Kaixuan Ren Jiabin Huang Siwen Luo S. Han 35 1 0 19 Apr 2024
DOCMASTER: A Unified Platform for Annotation, Training, & Inference in Document Question-Answering Alex Nguyen Zilong Wang Jingbo Shang Dheeraj Mekala 31 1 0 30 Mar 2024
RealKIE: Five Novel Datasets for Enterprise Key Information Extraction Benjamin Townsend Madison May Christopher Wells SyDa 33 0 0 29 Mar 2024
Tur[k]ingBench: A Challenge Benchmark for Web Agents Kevin Xu Yeganeh Kordi Kate Sanders Yizhong Wang Adam Byerly Kate Sanders Adam Byerly Jingyu Zhang Benjamin Van Durme Daniel Khashabi LLMAG 67 6 0 18 Mar 2024
DocGraphLM: Documental Graph Language Model for Information Extraction Dongsheng Wang Zhiqiang Ma Armineh Nourbakhsh Kang Gu Sameena Shah 17 8 0 05 Jan 2024
ESGReveal: An LLM-based approach for extracting structured data from ESG reports Yi Zou Mengying Shi Zhongjie Chen Zhu Deng Zongxiong Lei Zihan Zeng Shiming Yang Hongxiang Tong Lei Xiao Wenwen Zhou 39 8 0 25 Dec 2023
SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap Daehee Kim Yoon Kim Donghyun Kim Yumin Lim Geewook Kim Taeho Kil 21 3 0 21 Sep 2023
A Graphical Approach to Document Layout Analysis Jilin Wang Michael Krumdick Baojia Tong Hamima Halim M. Sokolov Vadym Barda Delphine Vendryes Christy Tanner 11 8 0 03 Aug 2023
Towards Zero-shot Relation Extraction in Web Mining: A Multimodal Approach with Relative XML Path Zilong Wang Jingbo Shang 31 0 0 23 May 2023
Language Independent Neuro-Symbolic Semantic Parsing for Form Understanding Bhanu Prakash Voutharoja Lizhen Qu Fatemeh Shiri 13 1 0 08 May 2023
The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces Kyle Lo Joseph Chee Chang Andrew Head Jonathan Bragg Amy X. Zhang ... Caroline M Wu Jiangjiang Yang Angele Zamarron Marti A. Hearst Daniel S. Weld 19 19 0 25 Mar 2023
Modeling Entities as Semantic Points for Visual Information Extraction in the Wild Zhibo Yang Rujiao Long Pengfei Wang Sibo Song Humen Zhong Wenqing Cheng X. Bai Cong Yao 19 19 0 23 Mar 2023
Alignment-Enriched Tuning for Patch-Level Pre-trained Document Image Models Lei Wang Jian He Xingdong Xu Ning Liu Hui-juan Liu 27 2 0 27 Nov 2022
RegCLR: A Self-Supervised Framework for Tabular Representation Learning in the Wild Weiyao Wang Byung-Hak Kim Varun Ganapathi SSL LMTD 20 1 0 02 Nov 2022
Key Information Extraction in Purchase Documents using Deep Learning and Rule-based Corrections R. Arroyo J. Yebes E. Martínez Hector Corrales Javier Lorenzo 17 1 0 07 Oct 2022
XDoc: Unified Pre-training for Cross-Format Document Understanding Jingye Chen Tengchao Lv Lei Cui Changrong Zhang Furu Wei 48 13 0 06 Oct 2022
One-Shot Doc Snippet Detection: Powering Search in Document Beyond Text Abhinav Java Shripad Deshmukh Milan Aggarwal Surgan Jandial Mausoom Sarkar Balaji Krishnamurthy 30 3 0 12 Sep 2022
TaCo: Textual Attribute Recognition via Contrastive Learning Chang Nie Yiqing Hu Yanqiu Qu Hao Liu Deqiang Jiang Bo Ren 15 0 0 22 Aug 2022
Knowing Where and What: Unified Word Block Pretraining for Document Understanding Song Tao Zijian Wang Tiantian Fan Canjie Luo Can Huang SSL 18 2 0 28 Jul 2022
Test-Time Adaptation for Visual Document Understanding Sayna Ebrahimi Sercan Ö. Arik Tomas Pfister OOD 29 6 0 15 Jun 2022
DiT: Self-supervised Pre-training for Document Image Transformer Junlong Li Yiheng Xu Tengchao Lv Lei Cui Chaoxi Zhang Furu Wei ViT VLM 19 159 0 04 Mar 2022
Zero-Shot Text-to-Image Generation Aditya A. Ramesh Mikhail Pavlov Gabriel Goh Scott Gray Chelsea Voss Alec Radford Mark Chen Ilya Sutskever VLM 253 4,764 0 24 Feb 2021
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding Yang Xu Yiheng Xu Tengchao Lv Lei Cui Furu Wei ... D. Florêncio Cha Zhang Wanxiang Che Min Zhang Lidong Zhou ViT MLLM 142 498 0 29 Dec 2020
FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents Guillaume Jaume H. K. Ekenel Jean-Philippe Thiran 119 353 0 27 May 2019