DiT: Self-supervised Pre-training for Document Image Transformer

4 March 2022

Papers citing "DiT: Self-supervised Pre-training for Document Image Transformer"

50 / 104 papers shown

Title
PHD: Pixel-Based Language Modeling of Historical Documents Nadav Borenstein Phillip Rust Desmond Elliott Isabelle Augenstein 18 3 0 22 Oct 2023
Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction Chong Zhang Ya Guo Yi Tu Huan Chen Jinyang Tang Huijia Zhu Qi Zhang Tao Gui 3DV 21 20 0 17 Oct 2023
DSG: An End-to-End Document Structure Generator Johannes Rausch Gentiana Rashiti Maxim Gusev Ce Zhang Stefan Feuerriegel 15 3 0 13 Oct 2023
Harnessing the Power of Multi-Lingual Datasets for Pre-training: Towards Enhancing Text Spotting Performance Alloy Das Sanket Biswas Ayan Banerjee Josep Lladós Umapada Pal Saumik Bhattacharya 23 3 0 02 Oct 2023
SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap Daehee Kim Yoon Kim Donghyun Kim Yumin Lim Geewook Kim Taeho Kil 21 3 0 21 Sep 2023
Kosmos-2.5: A Multimodal Literate Model Tengchao Lv Yupan Huang Jingye Chen Lei Cui Shuming Ma ... Weiyao Luo Shaoxiang Wu Guoxin Wang Cha Zhang Furu Wei VLM MLLM 21 63 0 20 Sep 2023
Document AI: A Comparative Study of Transformer-Based, Graph-Based Models, and Convolutional Neural Networks For Document Layout Analysis Sotirios Kastanas Shaomu Tan Yijiang He 17 1 0 29 Aug 2023
Vision Grid Transformer for Document Layout Analysis Cheng Da Chuwei Luo Qi Zheng Cong Yao ViT 26 27 0 29 Aug 2023
Ensemble of Anchor-Free Models for Robust Bangla Document Layout Segmentation U. Mong Md. Asib Rahman 15 0 0 28 Aug 2023
Beyond Document Page Classification: Design, Datasets, and Challenges Jordy Van Landeghem Sanket Biswas Matthew B. Blaschko Marie-Francine Moens 27 6 0 24 Aug 2023
A Graphical Approach to Document Layout Analysis Jilin Wang Michael Krumdick Baojia Tong Hamima Halim M. Sokolov Vadym Barda Delphine Vendryes Christy Tanner 13 8 0 03 Aug 2023
Multimodal Document Analytics for Banking Process Automation C. Gerling Stefan Lessmann 22 3 0 21 Jul 2023
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding Yanzhe Zhang Ruiyi Zhang Jiuxiang Gu Yufan Zhou Nedim Lipka Diyi Yang Tongfei Sun VLM MLLM 25 218 0 29 Jun 2023
Bridging the Performance Gap between DETR and R-CNN for Graphical Object Detection in Document Images Tahira Shehzadi K. Hashmi D. Stricker Marcus Liwicki Muhammad Zeshan Afzal 8 7 0 23 Jun 2023
On Evaluation of Document Classification using RVL-CDIP Stefan Larson Gordon Lim Kevin Leach 13 3 0 21 Jun 2023
DocumentNet: Bridging the Data Gap in Document Pre-Training Lijun Yu Jin Miao Xiaoyu Sun Jiayi Chen Alexander G. Hauptmann H. Dai Wei Wei 22 3 0 15 Jun 2023
Do-GOOD: Towards Distribution Shift Evaluation for Pre-Trained Visual Document Understanding Models Jiabang He Yilang Hu Lei Wang Xingdong Xu Ning Liu Hui-juan Liu Hengtao Shen VLM OOD 22 2 0 05 Jun 2023
LayoutMask: Enhance Text-Layout Interaction in Multi-modal Pre-training for Document Understanding Yi Tu Ya Guo Huan Chen Jinyang Tang 21 15 0 30 May 2023
GVdoc: Graph-based Visual Document Classification Fnu Mohbat Mohammed J. Zaki Catherine Finegan-Dollak Ashish Verma OOD 19 1 0 26 May 2023
ICDAR 2023 Competition on Robust Layout Segmentation in Corporate Documents Christoph Auer A. Nassar Maksym Lysak Michele Dolfi Nikolaos Livathinos Peter W. J. Staar OOD 3DV 22 6 0 24 May 2023
Sequence-to-Sequence Pre-training with Unified Modality Masking for Visual Document Understanding ShuWei Feng Tianyang Zhan Zhanming Jie Trung Quoc Luong Xiaoran Jin 13 1 0 16 May 2023
Document Understanding Dataset and Evaluation (DUDE) Jordy Van Landeghem Rubèn Pérez Tito Łukasz Borchmann Michal Pietruszka Pawel Józiak ... Bertrand Ackaert Ernest Valveny Matthew Blaschko Sien Moens Tomasz Stanislawek VGen 14 52 0 15 May 2023
SwinDocSegmenter: An End-to-End Unified Domain Adaptive Transformer for Document Instance Segmentation Ayan Banerjee Sanket Biswas Josep Lladós Umapada Pal ViT 12 16 0 08 May 2023
FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction Nils Loose Chun-Liang Li Hao Zhang Timothy Dozat Felix Mächtle ... Shangbang Long Siyang Qin Yasuhisa Fujii Nan Hua T. Eisenbarth SSL 40 17 0 04 May 2023
SelfDocSeg: A Self-Supervised vision-based Approach towards Document Segmentation Subhajit Maity Sanket Biswas Siladittya Manna Ayan Banerjee Josep Lladós Saumik Bhattacharya Umapada Pal 34 5 0 01 May 2023
PARAGRAPH2GRAPH: A GNN-based framework for layout paragraph analysis Shuyong Wei Nuo Xu 11 5 0 24 Apr 2023
Expressive Text-to-Image Generation with Rich Text Songwei Ge Taesung Park Jun-Yan Zhu Jia-Bin Huang DiffM 77 79 0 13 Apr 2023
Literature Review: Computer Vision Applications in Transportation Logistics and Warehousing Alexander Naumann Felix Hertlein Laura Doerr Steffen Thoma K. Furmans 22 8 0 12 Apr 2023
ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction Jiabang He Lei Wang Yingpeng Hu Ning Liu Hui-juan Liu Xingdong Xu Hengtao Shen MLLM 6 47 0 09 Mar 2023
ST-KeyS: Self-Supervised Transformer for Keyword Spotting in Historical Handwritten Documents Sana Khamekhem Jemni Sourour Ammar Mohamed Ali Souibgui Yousri Kessentini A. Cheddad 15 3 0 06 Mar 2023
StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training Yu Yu Yulin Li Chengquan Zhang Xiaoqiang Zhang Zengyuan Guo Xiameng Qin Kun Yao Junyu Han Errui Ding Jingdong Wang 8 45 0 01 Mar 2023
Open Problems in Applied Deep Learning M. Raissi AI4CE 24 2 0 26 Jan 2023
Wukong-Reader: Multi-modal Pre-training for Fine-grained Visual Document Understanding Haoli Bai Zhiguang Liu Xiaojun Meng Wentao Li Shuangning Liu ... Liangwei Wang Lu Hou Jiansheng Wei Xin Jiang Qun Liu ViT 14 11 0 19 Dec 2022
Page Layout Analysis of Text-heavy Historical Documents: a Comparison of Textual and Visual Approaches Sven Najem-Meyer Matteo Romanello 9 6 0 12 Dec 2022
Hierarchical multimodal transformers for Multi-Page DocVQA Rubèn Pérez Tito Dimosthenis Karatzas Ernest Valveny 11 54 0 07 Dec 2022
Alignment-Enriched Tuning for Patch-Level Pre-trained Document Image Models Lei Wang Jian He Xingdong Xu Ning Liu Hui-juan Liu 27 2 0 27 Nov 2022
Semantic Table Detection with LayoutLMv3 Ivan Silajev Niels Victor Phillip Mortimer 9 1 0 25 Nov 2022
Deep learning for table detection and structure recognition: A survey M. Kasem Abdelrahman Abdallah Alexander Berendeyev Ebrahem Elkady Mahmoud Abdalla Mohamed Mahmoud Mohamed Hamada D. Nurseitov I. Taj-Eddin LMTD 25 25 0 15 Nov 2022
RegCLR: A Self-Supervised Framework for Tabular Representation Learning in the Wild Weiyao Wang Byung-Hak Kim Varun Ganapathi SSL LMTD 20 1 0 02 Nov 2022
Evaluating Out-of-Distribution Performance on Document Image Classifiers Stefan Larson Gordon Lim Yutong Ai David Kuang Kevin Leach OODD OOD 24 18 0 14 Oct 2022
XDoc: Unified Pre-training for Cross-Format Document Understanding Jingye Chen Tengchao Lv Lei Cui Changrong Zhang Furu Wei 48 13 0 06 Oct 2022
Detect, Retrieve, Comprehend: A Flexible Framework for Zero-Shot Document-Level Question Answering T. McDonald Brian Tsan Amar Saini Juanita Ordoñez Luis Gutierrez Phan-Anh-Huy Nguyen Blake Mason Brenda Ng RALM 9 3 0 04 Oct 2022
One-Shot Doc Snippet Detection: Powering Search in Document Beyond Text Abhinav Java Shripad Deshmukh Milan Aggarwal Surgan Jandial Mausoom Sarkar Balaji Krishnamurthy 30 3 0 12 Sep 2022
TaCo: Textual Attribute Recognition via Contrastive Learning Chang Nie Yiqing Hu Yanqiu Qu Hao Liu Deqiang Jiang Bo Ren 17 0 0 22 Aug 2022
Multimodal Learning with Transformers: A Survey P. Xu Xiatian Zhu David A. Clifton ViT 41 522 0 13 Jun 2022
VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification Souhail Bakkali Zuheng Ming Mickael Coustaty Marccal Rusinol O. R. Terrades VLM 35 30 0 24 May 2022
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking Yupan Huang Tengchao Lv Lei Cui Yutong Lu Furu Wei 25 432 0 18 Apr 2022
Neural Graph Matching for Modification Similarity Applied to Electronic Document Comparison Po-Fang Hsu Chiching Wei 17 0 0 12 Apr 2022
Masked Autoencoders Are Scalable Vision Learners Kaiming He Xinlei Chen Saining Xie Yanghao Li Piotr Dollár Ross B. Girshick ViT TPM 258 7,412 0 11 Nov 2021
Emerging Properties in Self-Supervised Vision Transformers Mathilde Caron Hugo Touvron Ishan Misra Hervé Jégou Julien Mairal Piotr Bojanowski Armand Joulin 298 5,761 0 29 Apr 2021