Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2106.11539
Cited By
v1
v2 (latest)
DocFormer: End-to-End Transformer for Document Understanding
IEEE International Conference on Computer Vision (ICCV), 2021
22 June 2021
Srikar Appalaraju
Bhavan A. Jasani
Bhargava Urala Kota
Yusheng Xie
R. Manmatha
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"DocFormer: End-to-End Transformer for Document Understanding"
50 / 205 papers shown
Title
ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data
Yufan Shen
Chuwei Luo
Zhaoqing Zhu
Yang Chen
Qi Zheng
Zhi Yu
Jiajun Bu
Cong Yao
357
5
0
17 Jul 2024
RAVEN: Multitask Retrieval Augmented Vision-Language Learning
Varun Nagaraj Rao
Siddharth Choudhary
Aditya Deshpande
R. Satzoda
Srikar Appalaraju
RALM
VLM
223
7
0
27 Jun 2024
DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming
Jiaxin Zhang
Wentao Yang
Songxuan Lai
Zecheng Xie
Lianwen Jin
339
27
0
27 Jun 2024
ColPali: Efficient Document Retrieval with Vision Language Models
Manuel Faysse
Hugues Sibille
Tony Wu
Bilel Omrani
Gautier Viaud
C´eline Hudelot
Pierre Colombo
VLM
800
88
0
27 Jun 2024
SRFUND: A Multi-Granularity Hierarchical Structure Reconstruction Benchmark in Form Understanding
Jiefeng Ma
Yan Wang
Chenyu Liu
Jun Du
Yu Hu
Zhenrong Zhang
Pengfei Hu
Qing Wang
Jianshu Zhang
166
1
0
13 Jun 2024
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications
Jordy Van Landeghem
Subhajit Maity
Ayan Banerjee
Matthew Blaschko
Marie-Francine Moens
Josep Lladós
Sanket Biswas
298
3
0
12 Jun 2024
UnSupDLA: Towards Unsupervised Document Layout Analysis
Talha Uddin Sheikh
Tahira Shehzadi
K. Hashmi
Didier Stricker
Muhammad Zeshan Afzal
181
3
0
10 Jun 2024
Reconstructing training data from document understanding models
Jérémie Dentan
Arnaud Paran
A. Shabou
AAML
SyDa
200
3
0
05 Jun 2024
Lightweight Spatial Modeling for Combinatorial Information Extraction From Documents
Yanfei Dong
Lambert Deng
Jiazheng Zhang
Xiaodong Yu
Ting Lin
Francesco Gelli
Soujanya Poria
W. Lee
163
0
0
08 May 2024
GeoContrastNet: Contrastive Key-Value Edge Learning for Language-Agnostic Document Understanding
IEEE International Conference on Document Analysis and Recognition (ICDAR), 2024
Nil Biescas
Carlos Boned Riera
Josep Lladós
Sanket Biswas
184
4
0
06 May 2024
CREPE: Coordinate-Aware End-to-End Document Parser
Yamato Okamoto
Youngmin Baek
Geewook Kim
Ryota Nakao
Donghyun Kim
Moonbin Yim
Seunghyun Park
Bado Lee
204
2
0
01 May 2024
Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism
Lei Kang
Rubèn Pérez Tito
Ernest Valveny
Dimosthenis Karatzas
240
10
0
29 Apr 2024
Improve Academic Query Resolution through BERT-based Question Extraction from Images
Nidhi Kamal
Saurabh Yadav
Jorawar Singh
Aditi Avasthi
161
0
0
28 Apr 2024
A Hybrid Approach for Document Layout Analysis in Document images
Tahira Shehzadi
Didier Stricker
Muhammad Zeshan Afzal
182
11
0
27 Apr 2024
A review of deep learning-based information fusion techniques for multimodal medical image classification
Yi-Hsuan Li
Mostafa EL HABIB DAHO
Pierre-Henri Conze
Rachid Zeghlache
Hugo Le Boité
R. Tadayoni
B. Cochener
M. Lamard
G. Quellec
152
105
0
23 Apr 2024
Towards Efficient Resume Understanding: A Multi-Granularity Multi-Modal Pre-Training Approach
Feihu Jiang
Chuan Qin
Jingshuai Zhang
Kaichun Yao
Xi Chen
Dazhong Shen
Chen Zhu
Hengshu Zhu
Hui Xiong
174
11
0
13 Apr 2024
HRVDA: High-Resolution Visual Document Assistant
Computer Vision and Pattern Recognition (CVPR), 2024
Chaohu Liu
Kun Yin
Haoyu Cao
Xinghua Jiang
Xin Li
Yinsong Liu
Deqiang Jiang
Xing Sun
Linli Xu
VLM
243
30
0
10 Apr 2024
LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding
Chuwei Luo
Yufan Shen
Zhaoqing Zhu
Qi Zheng
Zhi Yu
Cong Yao
325
90
0
08 Apr 2024
Bidirectional Long-Range Parser for Sequential Data Understanding
George Leotescu
Daniel Voinea
A. Popa
189
1
0
08 Apr 2024
BuDDIE: A Business Document Dataset for Multi-task Information Extraction
Ran Zmigrod
Dongsheng Wang
Mathieu Sibue
Yulong Pei
Petr Babkin
...
Antony Papadimitriou
William Watson
Zhiqiang Ma
Armineh Nourbakhsh
Sameena Shah
200
7
0
05 Apr 2024
Noise-Aware Training of Layout-Aware Language Models
Ritesh Sarkhel
Xiaoqi Ren
Lauro Beltrao Costa
Guolong Su
Vincent Perot
Yanan Xie
Emmanouil Koukoumidis
Arnab Nandi
VLM
181
0
0
30 Mar 2024
OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition
Jianqiang Wan
Sibo Song
Wenwen Yu
Yuliang Liu
Wenqing Cheng
Fei Huang
Xiang Bai
Cong Yao
Zhibo Yang
247
72
0
28 Mar 2024
Can AI Models Appreciate Document Aesthetics? An Exploration of Legibility and Layout Quality in Relation to Prediction Confidence
Hsiu-Wei Yang
Abhinav Agrawal
Pavlos Fragkogiannis
Shubham Nitin Mulay
234
2
0
27 Mar 2024
Visually Guided Generative Text-Layout Pre-training for Document Intelligence
Zhiming Mao
Haoli Bai
Lu Hou
Jiansheng Wei
Xin Jiang
Qun Liu
Kam-Fai Wong
185
11
0
25 Mar 2024
RoDLA: Benchmarking the Robustness of Document Layout Analysis Models
Yufan Chen
Kailai Li
Kunyu Peng
Junwei Zheng
Ruiping Liu
Juil Sock
Rainer Stiefelhagen
OOD
157
13
0
21 Mar 2024
Transformers and Language Models in Form Understanding: A Comprehensive Review of Scanned Document Analysis
Abdelrahman Abdallah
Daniel Eberharter
Zoe Pfister
Adam Jatowt
165
15
0
06 Mar 2024
Enhancing Vision-Language Pre-training with Rich Supervisions
Yuan Gao
Kunyu Shi
Pengkai Zhu
Edouard Belval
Oren Nuriel
Srikar Appalaraju
Shabnam Ghadar
Vijay Mahadevan
Zhuowen Tu
Stefano Soatto
VLM
CLIP
346
15
0
05 Mar 2024
Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding
Hongshen Xu
Lu Chen
Zihan Zhao
Da Ma
Ruisheng Cao
Zichen Zhu
Kai Yu
153
5
0
28 Feb 2024
Improving Language Understanding from Screenshots
Tianyu Gao
Zirui Wang
Adithya Bhaskar
Danqi Chen
VLM
176
13
0
21 Feb 2024
LAPDoc: Layout-Aware Prompting for Documents
Marcel Lamott
Yves-Noel Weweler
A. Ulges
Faisal Shafait
Dirk Krechel
Darko Obradovic
283
16
0
15 Feb 2024
Beyond the Mud: Datasets and Benchmarks for Computer Vision in Off-Road Racing
Jacob Tyo
Motolani Olarinre
Youngseog Chung
Zachary Chase Lipton
159
0
0
12 Feb 2024
TreeForm: End-to-end Annotation and Evaluation for Form Document Parsing
Ran Zmigrod
Zhiqiang Ma
Armineh Nourbakhsh
Sameena Shah
180
4
0
07 Feb 2024
Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation
Maoyuan Ye
Jing Zhang
Juhua Liu
Chenyu Liu
Baocai Yin
Cong Liu
Bo Du
Dacheng Tao
VLM
190
31
0
31 Jan 2024
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions
AAAI Conference on Artificial Intelligence (AAAI), 2024
Ryota Tanaka
Taichi Iki
Kyosuke Nishida
Kuniko Saito
Jun Suzuki
VLM
210
33
0
24 Jan 2024
Watermark Text Pattern Spotting in Document Images
Mateusz Krubiński
Stefan Matcovici
Diana Grigore
Daniel Voinea
A. Popa
WaLM
181
3
0
10 Jan 2024
PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction
Zening Lin
Jiapeng Wang
Teng Li
Wenhui Liao
Dayi Huang
Longfei Xiong
Lianwen Jin
174
3
0
07 Jan 2024
GRAM: Global Reasoning for Multi-Page VQA
Tsachi Blau
Sharon Fogel
Roi Ronen
Alona Golts
Roy Ganz
Elad Ben Avraham
Aviad Aberdam
Shahar Tsiper
Ron Litman
205
21
0
07 Jan 2024
DocGraphLM: Documental Graph Language Model for Information Extraction
Dongsheng Wang
Zhiqiang Ma
Armineh Nourbakhsh
Kang Gu
Sameena Shah
160
13
0
05 Jan 2024
LANS: A Layout-Aware Neural Solver for Plane Geometry Problem
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Zhong-Zhi Li
Ming-Liang Zhang
Fei Yin
Cheng-Lin Liu
184
20
0
25 Nov 2023
DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding
Hao Feng
Qi Liu
Hao Liu
Wen-gang Zhou
Houqiang Li
Can Huang
VLM
285
92
0
20 Nov 2023
Efficient End-to-End Visual Document Understanding with Rationale Distillation
Peng Guo
Alekh Agarwal
Mandar Joshi
Robin Jia
Jesse Thomason
Kristina Toutanova
136
4
0
16 Nov 2023
DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder Transformer Models
Peng Tang
Pengkai Zhu
Tian Li
Srikar Appalaraju
Vijay Mahadevan
R. Manmatha
192
9
0
15 Nov 2023
Multiple-Question Multiple-Answer Text-VQA
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Peng Tang
Srikar Appalaraju
R. Manmatha
Yusheng Xie
Vijay Mahadevan
198
7
0
15 Nov 2023
Reading Between the Mud: A Challenging Motorcycle Racer Number Dataset
Jacob Tyo
Youngseog Chung
Motolani Olarinre
Zachary Chase Lipton
120
0
0
14 Nov 2023
ETDPC: A Multimodality Framework for Classifying Pages in Electronic Theses and Dissertations
Muntabir Hasan Choudhury
Lamia Salsabil
William A. Ingram
Edward A. Fox
Jian Wu
142
0
0
07 Nov 2023
Image Generation and Learning Strategy for Deep Document Forgery Detection
Yamato Okamoto
Osada Genki
Iu Yahiro
Rintaro Hasegawa
Peifei Zhu
Hirokatsu Kataoka
AAML
211
4
0
07 Nov 2023
On Task-personalized Multimodal Few-shot Learning for Visually-rich Document Entity Retrieval
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Jiayi Chen
H. Dai
Bo Dai
Aidong Zhang
Wei Wei
259
3
0
01 Nov 2023
Exploring OCR Capabilities of GPT-4V(ision) : A Quantitative and In-depth Evaluation
Yongxin Shi
Dezhi Peng
Wenhui Liao
Zening Lin
Xinhong Chen
Chongyu Liu
Yuyi Zhang
Lianwen Jin
MLLM
363
52
0
25 Oct 2023
Enhancing Document Information Analysis with Multi-Task Pre-training: A Robust Approach for Information Extraction in Visually-Rich Documents
IEEE International Joint Conference on Neural Network (IJCNN), 2023
Tofik Ali
Partha Pratim Roy
199
0
0
25 Oct 2023
A Multi-Modal Multilingual Benchmark for Document Image Classification
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yoshinari Fujinuma
Siddharth Varia
Nishant Sankaran
Srikar Appalaraju
Bonan Min
Yogarshi Vyas
VLM
211
5
0
25 Oct 2023
Previous
1
2
3
4
5
Next