Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.02378
Cited By
DiT: Self-supervised Pre-training for Document Image Transformer
4 March 2022
Junlong Li
Yiheng Xu
Tengchao Lv
Lei Cui
Chaoxi Zhang
Furu Wei
ViT
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DiT: Self-supervised Pre-training for Document Image Transformer"
50 / 104 papers shown
Title
Single Document Image Highlight Removal via A Large-Scale Real-World Dataset and A Location-Aware Network
Lu Pan
Yu-Hsuan Huang
Hongxia Xie
Cheng Zhang
H Zhao
Hong-Han Shuai
Wen-Huang Cheng
23
0
0
19 Apr 2025
DocSAM: Unified Document Image Segmentation via Query Decomposition and Heterogeneous Mixed Learning
Xiao-Hui Li
Fei Yin
Cheng-Lin Liu
23
0
0
05 Apr 2025
Masked Self-Supervised Pre-Training for Text Recognition Transformers on Large-Scale Datasets
Martin Kiss
Michal Hradiš
34
0
0
28 Mar 2025
BiblioPage: A Dataset of Scanned Title Pages for Bibliographic Metadata Extraction
Jan Kohút
Martin Dočekal
Michal Hradiš
Marek Vaško
32
0
0
25 Mar 2025
SFDLA: Source-Free Document Layout Analysis
Sebastian Tewes
Yufan Chen
Omar Moured
Jiaming Zhang
Rainer Stiefelhagen
48
0
0
24 Mar 2025
PP-DocLayout: A Unified Document Layout Detection Model to Accelerate Large-Scale Data Construction
Ting Sun
Cheng Cui
Yuning Du
Yi Liu
36
1
0
21 Mar 2025
UniHDSA: A Unified Relation Prediction Approach for Hierarchical Document Structure Analysis
Jiawei Wang
Kai Hu
Qiang Huo
53
0
0
20 Mar 2025
TextBite: A Historical Czech Document Dataset for Logical Page Segmentation
Martin Kostelník
Karel Beneš
Michal Hradiš
32
0
0
20 Mar 2025
LiGT: Layout-infused Generative Transformer for Visual Question Answering on Vietnamese Receipts
Thanh-Phong Le
Trung Le Chi Phan
Nghia Hieu Nguyen
Kiet Van Nguyen
ViT
44
0
0
26 Feb 2025
Reading the unreadable: Creating a dataset of 19th century English newspapers using image-to-text language models
Jonathan Bourne
75
0
0
24 Feb 2025
EDocNet: Efficient Datasheet Layout Analysis Based on Focus and Global Knowledge Distillation
Hong Cai Chen
Longchang Wu
Yang Zhang
34
0
0
23 Feb 2025
Label Errors in the Tobacco3482 Dataset
Gordon Lim
Stefan Larson
Kevin Leach
74
0
0
17 Dec 2024
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations
Linke Ouyang
Yuan Qu
Hongbin Zhou
Jiawei Zhu
Rui Zhang
...
Chao Xu
Bo Zhang
Botian Shi
Zhongying Tu
Conghui He
96
5
0
10 Dec 2024
NeurIPS 2023 Competition: Privacy Preserving Federated Learning Document VQA
Marlon Tobaben
Mohamed Ali Souibgui
Rubèn Pérez Tito
Khanh Nguyen
Raouf Kerkouche
...
Josep Lladós
Ernest Valveny
Antti Honkela
Mario Fritz
Dimosthenis Karatzas
FedML
28
0
0
06 Nov 2024
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
Zhiyuan Zhao
Hengrui Kang
Bin Wang
Conghui He
25
9
0
16 Oct 2024
Present and Future Generalization of Synthetic Image Detectors
Pablo Bernabeu Perez
Enrique Lopez-Cuena
Dario Garcia-Gasulla
16
0
0
21 Sep 2024
ViRED: Prediction of Visual Relations in Engineering Drawings
Chao Gu
Ke Lin
Yiyang Luo
Jiahui Hou
Xiang-Yang Li
16
0
0
02 Sep 2024
μgat: Improving Single-Page Document Parsing by Providing Multi-Page Context
Fabio Quattrini
Carmine Zaccagnino
Silvia Cascianelli
Laura Righi
Rita Cucchiara
31
1
0
28 Aug 2024
Large Language Models for Page Stream Segmentation
H. Heidenreich
Ratish Dalvi
Rohith Mukku
Nikhil Verma
Neven Pičuljan
35
0
0
21 Aug 2024
Deep Learning based Visually Rich Document Content Understanding: A Survey
Muhammad Ali
Jean Lee
Salman Khan
29
6
0
02 Aug 2024
UNER: A Unified Prediction Head for Named Entity Recognition in Visually-rich Documents
Yi Tu
Chong Zhang
Ya Guo
Huan Chen
Jinyang Tang
Huijia Zhu
Qi Zhang
38
3
0
02 Aug 2024
SciPostLayout: A Dataset for Layout Analysis and Layout Generation of Scientific Posters
Shohei Tanaka
Hao Wang
Yoshitaka Ushiku
19
0
0
29 Jul 2024
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications
Jordy Van Landeghem
Subhajit Maity
Ayan Banerjee
Matthew Blaschko
Marie-Francine Moens
Josep Lladós
Sanket Biswas
41
2
0
12 Jun 2024
TRINS: Towards Multimodal Language Models that Can Read
Ruiyi Zhang
Yanzhe Zhang
Jian Chen
Yufan Zhou
Jiuxiang Gu
Changyou Chen
Tong Sun
VLM
26
6
0
10 Jun 2024
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark
Haoxing Chen
Yan Hong
Zizheng Huang
Zhuoer Xu
Zhangxuan Gu
...
Jun Lan
Huijia Zhu
Jianfu Zhang
Weiqiang Wang
Huaxiong Li
Mamba
83
13
0
30 May 2024
Leveraging Semantic Segmentation Masks with Embeddings for Fine-Grained Form Classification
Taylor Archibald
Tony R. Martinez
AI4TS
21
0
0
23 May 2024
DLAFormer: An End-to-End Transformer For Document Layout Analysis
Jiawei Wang
Kai Hu
Qiang Huo
3DV
ViT
22
3
0
20 May 2024
Self-supervised Pre-training of Text Recognizers
M. Kišš
Michal Hradiš
SSL
32
1
0
01 May 2024
Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism
Lei Kang
Rubèn Pérez Tito
Ernest Valveny
Dimosthenis Karatzas
25
5
0
29 Apr 2024
A Hybrid Approach for Document Layout Analysis in Document images
Tahira Shehzadi
Didier Stricker
Muhammad Zeshan Afzal
29
5
0
27 Apr 2024
DiffFAE: Advancing High-fidelity One-shot Facial Appearance Editing with Space-sensitive Customization and Semantic Preservation
Qilin Wang
Jiangning Zhang
Chengming Xu
Weijian Cao
Ying Tai
Yue Han
Yanhao Ge
Hong Gu
Chengjie Wang
Yanwei Fu
DiffM
35
0
0
26 Mar 2024
RoDLA: Benchmarking the Robustness of Document Layout Analysis Models
Yufan Chen
Jiaming Zhang
Kunyu Peng
Junwei Zheng
Ruiping Liu
Philip H. S. Torr
Rainer Stiefelhagen
OOD
29
5
0
21 Mar 2024
LayoutLLM: Large Language Model Instruction Tuning for Visually Rich Document Understanding
Masato Fujitake
MLLM
14
15
0
21 Mar 2024
Transformers and Language Models in Form Understanding: A Comprehensive Review of Scanned Document Analysis
Abdelrahman Abdallah
Daniel Eberharter
Zoe Pfister
Adam Jatowt
27
12
0
06 Mar 2024
Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding
Hongshen Xu
Lu Chen
Zihan Zhao
Da Ma
Ruisheng Cao
Zichen Zhu
Kai Yu
29
2
0
28 Feb 2024
Universal Physics Transformers: A Framework For Efficiently Scaling Neural Operators
Benedikt Alkin
Andreas Fürst
Simon Schmid
Lukas Gruber
Markus Holzleitner
Johannes Brandstetter
PINN
AI4CE
35
8
0
19 Feb 2024
GraphKD: Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation
Ayan Banerjee
Sanket Biswas
Josep Lladós
Umapada Pal
38
1
0
17 Feb 2024
Text Role Classification in Scientific Charts Using Multimodal Transformers
Hye Jin Kim
N. Lell
A. Scherp
14
0
0
08 Feb 2024
Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure Analysis
Jiawei Wang
Kai Hu
Zhuoyao Zhong
Lei-huan Sun
Qiang Huo
19
6
0
22 Jan 2024
Dynamic Relation Transformer for Contextual Text Block Detection
Jiawei Wang
Shunchi Zhang
Kai Hu
Chixiang Ma
Zhuoyao Zhong
Lei-huan Sun
Qiang Huo
16
0
0
17 Jan 2024
Skin Cancer Segmentation and Classification Using Vision Transformer for Automatic Analysis in Dermatoscopy-based Non-invasive Digital System
Galib Muhammad Shahriar Himel
Md. Masudul Islam
Kh Abdullah Al-Aff
Shams Ibne Karim
Md. Kabir Uddin Sikder
MedIm
13
23
0
09 Jan 2024
DocLLM: A layout-aware generative language model for multimodal document understanding
Dongsheng Wang
Natraj Raman
Mathieu Sibue
Zhiqiang Ma
Petr Babkin
Simerjot Kaur
Yulong Pei
Armineh Nourbakhsh
Xiaomo Liu
VLM
12
50
0
31 Dec 2023
TDeLTA: A Light-weight and Robust Table Detection Method based on Learning Text Arrangement
Yang Fan
Xiangping Wu
Qingcai Chen
Heng Li
Yan Huang
Zhixiang Cai
Qitian Wu
LMTD
16
0
0
18 Dec 2023
WordScape: a Pipeline to extract multilingual, visually rich Documents with Layout Annotations from Web Crawl Data
Maurice Weber
Carlo Siebenschuh
Rory Butler
Anton Alexandrov
Valdemar Thanner
...
Haris Jabbar
Ian T. Foster
Bo-wen Li
Rick L. Stevens
Ce Zhang
11
4
0
15 Dec 2023
Privacy-Aware Document Visual Question Answering
Rubèn Pérez Tito
Khanh Nguyen
Marlon Tobaben
Raouf Kerkouche
Mohamed Ali Souibgui
...
Lei Kang
Ernest Valveny
Antti Honkela
Mario Fritz
Dimosthenis Karatzas
22
13
0
15 Dec 2023
ESG Accountability Made Easy: DocQA at Your Service
Lokesh Mishra
Cesar Berrospi
K. Dinkla
Diego Antognini
Francesco Fusco
...
Panagiotis Vagenas
Lucas Morin
Christoph Auer
Michele Dolfi
Peter W. J. Staar
23
3
0
30 Nov 2023
High-Performance Transformers for Table Structure Recognition Need Early Convolutions
Sheng-Hsuan Peng
Seongmin Lee
Xiaojing Wang
Rajarajeswari Balasubramaniyan
Duen Horng Chau
ViT
LMTD
19
3
0
09 Nov 2023
DCQA: Document-Level Chart Question Answering towards Complex Reasoning and Common-Sense Understanding
Anran Wu
Luwei Xiao
Xingjiao Wu
Shuwen Yang
Junjie Xu
Zisong Zhuang
Nian Xie
Cheng Jin
Liang He
16
0
0
29 Oct 2023
Enhancing Document Information Analysis with Multi-Task Pre-training: A Robust Approach for Information Extraction in Visually-Rich Documents
Tofik Ali
Partha Pratim Roy
13
0
0
25 Oct 2023
A Multi-Modal Multilingual Benchmark for Document Image Classification
Yoshinari Fujinuma
Siddharth Varia
Nishant Sankaran
Srikar Appalaraju
Bonan Min
Yogarshi Vyas
VLM
18
4
0
25 Oct 2023
1
2
3
Next