Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.02378
Cited By
DiT: Self-supervised Pre-training for Document Image Transformer
4 March 2022
Junlong Li
Yiheng Xu
Tengchao Lv
Lei Cui
Chaoxi Zhang
Furu Wei
ViT
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DiT: Self-supervised Pre-training for Document Image Transformer"
50 / 104 papers shown
Title
PHD: Pixel-Based Language Modeling of Historical Documents
Nadav Borenstein
Phillip Rust
Desmond Elliott
Isabelle Augenstein
18
3
0
22 Oct 2023
Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction
Chong Zhang
Ya Guo
Yi Tu
Huan Chen
Jinyang Tang
Huijia Zhu
Qi Zhang
Tao Gui
3DV
21
20
0
17 Oct 2023
DSG: An End-to-End Document Structure Generator
Johannes Rausch
Gentiana Rashiti
Maxim Gusev
Ce Zhang
Stefan Feuerriegel
15
3
0
13 Oct 2023
Harnessing the Power of Multi-Lingual Datasets for Pre-training: Towards Enhancing Text Spotting Performance
Alloy Das
Sanket Biswas
Ayan Banerjee
Josep Lladós
Umapada Pal
Saumik Bhattacharya
23
3
0
02 Oct 2023
SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap
Daehee Kim
Yoon Kim
Donghyun Kim
Yumin Lim
Geewook Kim
Taeho Kil
21
3
0
21 Sep 2023
Kosmos-2.5: A Multimodal Literate Model
Tengchao Lv
Yupan Huang
Jingye Chen
Lei Cui
Shuming Ma
...
Weiyao Luo
Shaoxiang Wu
Guoxin Wang
Cha Zhang
Furu Wei
VLM
MLLM
21
63
0
20 Sep 2023
Document AI: A Comparative Study of Transformer-Based, Graph-Based Models, and Convolutional Neural Networks For Document Layout Analysis
Sotirios Kastanas
Shaomu Tan
Yijiang He
17
1
0
29 Aug 2023
Vision Grid Transformer for Document Layout Analysis
Cheng Da
Chuwei Luo
Qi Zheng
Cong Yao
ViT
26
27
0
29 Aug 2023
Ensemble of Anchor-Free Models for Robust Bangla Document Layout Segmentation
U. Mong
Md. Asib Rahman
15
0
0
28 Aug 2023
Beyond Document Page Classification: Design, Datasets, and Challenges
Jordy Van Landeghem
Sanket Biswas
Matthew B. Blaschko
Marie-Francine Moens
27
6
0
24 Aug 2023
A Graphical Approach to Document Layout Analysis
Jilin Wang
Michael Krumdick
Baojia Tong
Hamima Halim
M. Sokolov
Vadym Barda
Delphine Vendryes
Christy Tanner
13
8
0
03 Aug 2023
Multimodal Document Analytics for Banking Process Automation
C. Gerling
Stefan Lessmann
22
3
0
21 Jul 2023
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding
Yanzhe Zhang
Ruiyi Zhang
Jiuxiang Gu
Yufan Zhou
Nedim Lipka
Diyi Yang
Tongfei Sun
VLM
MLLM
25
218
0
29 Jun 2023
Bridging the Performance Gap between DETR and R-CNN for Graphical Object Detection in Document Images
Tahira Shehzadi
K. Hashmi
D. Stricker
Marcus Liwicki
Muhammad Zeshan Afzal
8
7
0
23 Jun 2023
On Evaluation of Document Classification using RVL-CDIP
Stefan Larson
Gordon Lim
Kevin Leach
13
3
0
21 Jun 2023
DocumentNet: Bridging the Data Gap in Document Pre-Training
Lijun Yu
Jin Miao
Xiaoyu Sun
Jiayi Chen
Alexander G. Hauptmann
H. Dai
Wei Wei
22
3
0
15 Jun 2023
Do-GOOD: Towards Distribution Shift Evaluation for Pre-Trained Visual Document Understanding Models
Jiabang He
Yilang Hu
Lei Wang
Xingdong Xu
Ning Liu
Hui-juan Liu
Hengtao Shen
VLM
OOD
22
2
0
05 Jun 2023
LayoutMask: Enhance Text-Layout Interaction in Multi-modal Pre-training for Document Understanding
Yi Tu
Ya Guo
Huan Chen
Jinyang Tang
21
15
0
30 May 2023
GVdoc: Graph-based Visual Document Classification
Fnu Mohbat
Mohammed J. Zaki
Catherine Finegan-Dollak
Ashish Verma
OOD
19
1
0
26 May 2023
ICDAR 2023 Competition on Robust Layout Segmentation in Corporate Documents
Christoph Auer
A. Nassar
Maksym Lysak
Michele Dolfi
Nikolaos Livathinos
Peter W. J. Staar
OOD
3DV
22
6
0
24 May 2023
Sequence-to-Sequence Pre-training with Unified Modality Masking for Visual Document Understanding
ShuWei Feng
Tianyang Zhan
Zhanming Jie
Trung Quoc Luong
Xiaoran Jin
13
1
0
16 May 2023
Document Understanding Dataset and Evaluation (DUDE)
Jordy Van Landeghem
Rubèn Pérez Tito
Łukasz Borchmann
Michal Pietruszka
Pawel Józiak
...
Bertrand Ackaert
Ernest Valveny
Matthew Blaschko
Sien Moens
Tomasz Stanislawek
VGen
14
52
0
15 May 2023
SwinDocSegmenter: An End-to-End Unified Domain Adaptive Transformer for Document Instance Segmentation
Ayan Banerjee
Sanket Biswas
Josep Lladós
Umapada Pal
ViT
12
16
0
08 May 2023
FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction
Nils Loose
Chun-Liang Li
Hao Zhang
Timothy Dozat
Felix Mächtle
...
Shangbang Long
Siyang Qin
Yasuhisa Fujii
Nan Hua
T. Eisenbarth
SSL
40
17
0
04 May 2023
SelfDocSeg: A Self-Supervised vision-based Approach towards Document Segmentation
Subhajit Maity
Sanket Biswas
Siladittya Manna
Ayan Banerjee
Josep Lladós
Saumik Bhattacharya
Umapada Pal
34
5
0
01 May 2023
PARAGRAPH2GRAPH: A GNN-based framework for layout paragraph analysis
Shuyong Wei
Nuo Xu
11
5
0
24 Apr 2023
Expressive Text-to-Image Generation with Rich Text
Songwei Ge
Taesung Park
Jun-Yan Zhu
Jia-Bin Huang
DiffM
77
79
0
13 Apr 2023
Literature Review: Computer Vision Applications in Transportation Logistics and Warehousing
Alexander Naumann
Felix Hertlein
Laura Doerr
Steffen Thoma
K. Furmans
22
8
0
12 Apr 2023
ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction
Jiabang He
Lei Wang
Yingpeng Hu
Ning Liu
Hui-juan Liu
Xingdong Xu
Hengtao Shen
MLLM
6
47
0
09 Mar 2023
ST-KeyS: Self-Supervised Transformer for Keyword Spotting in Historical Handwritten Documents
Sana Khamekhem Jemni
Sourour Ammar
Mohamed Ali Souibgui
Yousri Kessentini
A. Cheddad
15
3
0
06 Mar 2023
StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training
Yu Yu
Yulin Li
Chengquan Zhang
Xiaoqiang Zhang
Zengyuan Guo
Xiameng Qin
Kun Yao
Junyu Han
Errui Ding
Jingdong Wang
8
45
0
01 Mar 2023
Open Problems in Applied Deep Learning
M. Raissi
AI4CE
24
2
0
26 Jan 2023
Wukong-Reader: Multi-modal Pre-training for Fine-grained Visual Document Understanding
Haoli Bai
Zhiguang Liu
Xiaojun Meng
Wentao Li
Shuangning Liu
...
Liangwei Wang
Lu Hou
Jiansheng Wei
Xin Jiang
Qun Liu
ViT
14
11
0
19 Dec 2022
Page Layout Analysis of Text-heavy Historical Documents: a Comparison of Textual and Visual Approaches
Sven Najem-Meyer
Matteo Romanello
9
6
0
12 Dec 2022
Hierarchical multimodal transformers for Multi-Page DocVQA
Rubèn Pérez Tito
Dimosthenis Karatzas
Ernest Valveny
11
54
0
07 Dec 2022
Alignment-Enriched Tuning for Patch-Level Pre-trained Document Image Models
Lei Wang
Jian He
Xingdong Xu
Ning Liu
Hui-juan Liu
27
2
0
27 Nov 2022
Semantic Table Detection with LayoutLMv3
Ivan Silajev
Niels Victor
Phillip Mortimer
9
1
0
25 Nov 2022
Deep learning for table detection and structure recognition: A survey
M. Kasem
Abdelrahman Abdallah
Alexander Berendeyev
Ebrahem Elkady
Mahmoud Abdalla
Mohamed Mahmoud
Mohamed Hamada
D. Nurseitov
I. Taj-Eddin
LMTD
25
25
0
15 Nov 2022
RegCLR: A Self-Supervised Framework for Tabular Representation Learning in the Wild
Weiyao Wang
Byung-Hak Kim
Varun Ganapathi
SSL
LMTD
20
1
0
02 Nov 2022
Evaluating Out-of-Distribution Performance on Document Image Classifiers
Stefan Larson
Gordon Lim
Yutong Ai
David Kuang
Kevin Leach
OODD
OOD
24
18
0
14 Oct 2022
XDoc: Unified Pre-training for Cross-Format Document Understanding
Jingye Chen
Tengchao Lv
Lei Cui
Changrong Zhang
Furu Wei
48
13
0
06 Oct 2022
Detect, Retrieve, Comprehend: A Flexible Framework for Zero-Shot Document-Level Question Answering
T. McDonald
Brian Tsan
Amar Saini
Juanita Ordoñez
Luis Gutierrez
Phan-Anh-Huy Nguyen
Blake Mason
Brenda Ng
RALM
9
3
0
04 Oct 2022
One-Shot Doc Snippet Detection: Powering Search in Document Beyond Text
Abhinav Java
Shripad Deshmukh
Milan Aggarwal
Surgan Jandial
Mausoom Sarkar
Balaji Krishnamurthy
30
3
0
12 Sep 2022
TaCo: Textual Attribute Recognition via Contrastive Learning
Chang Nie
Yiqing Hu
Yanqiu Qu
Hao Liu
Deqiang Jiang
Bo Ren
17
0
0
22 Aug 2022
Multimodal Learning with Transformers: A Survey
P. Xu
Xiatian Zhu
David A. Clifton
ViT
41
522
0
13 Jun 2022
VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification
Souhail Bakkali
Zuheng Ming
Mickael Coustaty
Marccal Rusinol
O. R. Terrades
VLM
35
30
0
24 May 2022
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Yupan Huang
Tengchao Lv
Lei Cui
Yutong Lu
Furu Wei
25
432
0
18 Apr 2022
Neural Graph Matching for Modification Similarity Applied to Electronic Document Comparison
Po-Fang Hsu
Chiching Wei
17
0
0
12 Apr 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
258
7,412
0
11 Nov 2021
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
298
5,761
0
29 Apr 2021
Previous
1
2
3
Next