Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2106.11539
Cited By
v1
v2 (latest)
DocFormer: End-to-End Transformer for Document Understanding
IEEE International Conference on Computer Vision (ICCV), 2021
22 June 2021
Srikar Appalaraju
Bhavan A. Jasani
Bhargava Urala Kota
Yusheng Xie
R. Manmatha
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"DocFormer: End-to-End Transformer for Document Understanding"
50 / 205 papers shown
CLIPTER: Looking at the Bigger Picture in Scene Text Recognition
IEEE International Conference on Computer Vision (ICCV), 2023
Aviad Aberdam
David Bensaid
Alona Golts
Roy Ganz
Oren Nuriel
Royee Tichauer
Shai Mazor
Ron Litman
VLM
CLIP
328
26
0
18 Jan 2023
Towards Models that Can See and Read
IEEE International Conference on Computer Vision (ICCV), 2023
Roy Ganz
Oren Nuriel
Aviad Aberdam
Yair Kittenplon
Shai Mazor
Ron Litman
294
16
0
18 Jan 2023
An Augmentation Strategy for Visually Rich Documents
Jing Xie
James Bradley Wendt
Yichao Zhou
Seth Ebner
Sandeep Tata
234
0
0
20 Dec 2022
Wukong-Reader: Multi-modal Pre-training for Fine-grained Visual Document Understanding
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Haoli Bai
Zhiguang Liu
Xiaojun Meng
Wentao Li
Shuangning Liu
...
Liangwei Wang
Lu Hou
Jiansheng Wei
Xin Jiang
Qun Liu
ViT
230
18
0
19 Dec 2022
CLIPPO: Image-and-Language Understanding from Pixels Only
Computer Vision and Pattern Recognition (CVPR), 2022
Michael Tschannen
Basil Mustafa
N. Houlsby
CLIP
VLM
343
74
0
15 Dec 2022
Unifying Vision, Text, and Layout for Universal Document Processing
Computer Vision and Pattern Recognition (CVPR), 2022
Zineng Tang
Ziyi Yang
Guoxin Wang
Yuwei Fang
Yang Liu
Chenguang Zhu
Michael Zeng
Chao-Yue Zhang
Joey Tianyi Zhou
VLM
346
153
0
05 Dec 2022
MGDoc: Pre-training with Multi-granular Hierarchy for Document Image Understanding
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Zilong Wang
Jiuxiang Gu
Chris Tensmeyer
Nikolaos Barmpalios
A. Nenkova
Tong Sun
Jingbo Shang
Vlad I. Morariu
VLM
165
13
0
27 Nov 2022
Alignment-Enriched Tuning for Patch-Level Pre-trained Document Image Models
AAAI Conference on Artificial Intelligence (AAAI), 2022
Lei Wang
Jian He
Xingdong Xu
Ning Liu
Hui-juan Liu
210
2
0
27 Nov 2022
YORO -- Lightweight End to End Visual Grounding
Chih-Hui Ho
Srikar Appalaraju
Bhavan A. Jasani
R. Manmatha
Nuno Vasconcelos
ObjD
173
27
0
15 Nov 2022
VRDU: A Benchmark for Visually-rich Document Understanding
Knowledge Discovery and Data Mining (KDD), 2022
Zilong Wang
Yichao Zhou
Wei Wei
Chen-Yu Lee
Sandeep Tata
168
26
0
15 Nov 2022
QueryForm: A Simple Zero-shot Form Entity Query Framework
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Zifeng Wang
Zizhao Zhang
Jacob Devlin
Chen-Yu Lee
Guolong Su
Hao Zhang
Jennifer Dy
Vincent Perot
Tomas Pfister
128
8
0
14 Nov 2022
FormLM: Recommending Creation Ideas for Online Forms by Modelling Semantic and Structural Information
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yijia Shao
Mengyu Zhou
Yifan Zhong
Tao Wu
Hongwei Han
Shi Han
Gideon Huang
Dongmei Zhang
3DV
259
4
0
10 Nov 2022
DoSA : A System to Accelerate Annotations on Business Documents with Human-in-the-Loop
Neelesh K Shukla
Msp Raja
Raghu Katikeri
Amit Vaid
86
1
0
09 Nov 2022
Evaluating Out-of-Distribution Performance on Document Image Classifiers
Neural Information Processing Systems (NeurIPS), 2022
Stefan Larson
Gordon Lim
Yutong Ai
David Kuang
Kevin Leach
OODD
OOD
290
21
0
14 Oct 2022
ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Qiming Peng
Yinxu Pan
Wenjin Wang
Bin Luo
Zhenyu Zhang
...
Shi Feng
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
197
101
0
12 Oct 2022
Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding
International Conference on Machine Learning (ICML), 2022
Kenton Lee
Mandar Joshi
Iulia Turc
Hexiang Hu
Fangyu Liu
Julian Martin Eisenschlos
Urvashi Khandelwal
Peter Shaw
Ming-Wei Chang
Kristina Toutanova
CLIP
VLM
826
374
0
07 Oct 2022
XDoc: Unified Pre-training for Cross-Format Document Understanding
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Jingye Chen
Tengchao Lv
Lei Cui
Changrong Zhang
Furu Wei
271
16
0
06 Oct 2022
ERNIE-mmLayout: Multi-grained MultiModal Transformer for Document Understanding
Wenjin Wang
Zhengjie Huang
Bin Luo
Qianglong Chen
Qiming Peng
...
Weichong Yin
Shi Feng
Yu Sun
Dianhai Yu
Yin Zhang
ViT
181
13
0
18 Sep 2022
One-Shot Doc Snippet Detection: Powering Search in Document Beyond Text
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Abhinav Java
Shripad Deshmukh
Milan Aggarwal
Surgan Jandial
Mausoom Sarkar
Balaji Krishnamurthy
177
3
0
12 Sep 2022
Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural Networks
Andrea Gemelli
Sanket Biswas
Enrico Civitelli
Josep Lladós
S. Marinai
161
22
0
23 Aug 2022
Understanding Long Documents with Different Position-Aware Attentions
Hai Pham
Guoxin Wang
Yijuan Lu
D. Florêncio
Changrong Zhang
167
10
0
17 Aug 2022
Knowing Where and What: Unified Word Block Pretraining for Document Understanding
Song Tao
Zijian Wang
Tiantian Fan
Canjie Luo
Can Huang
SSL
250
2
0
28 Jul 2022
Towards Complex Document Understanding By Discrete Reasoning
ACM Multimedia (ACM MM), 2022
Fengbin Zhu
Wenqiang Lei
Fuli Feng
Chao Wang
Haozhou Zhang
Tat-Seng Chua
343
83
0
25 Jul 2022
Layout-Aware Information Extraction for Document-Grounded Dialogue: Dataset, Method and Demonstration
ACM Multimedia (ACM MM), 2022
Zhenyu Zhang
Yu Bowen
Haiyang Yu
Tingwen Liu
Cheng Fu
Jingyang Li
Chengguang Tang
Jian Sun
Yongbin Li
255
5
0
14 Jul 2022
GMN: Generative Multi-modal Network for Practical Document Information Extraction
North American Chapter of the Association for Computational Linguistics (NAACL), 2022
H. Cao
Jiefeng Ma
Antai Guo
Yiqing Hu
Hao Liu
Deqiang Jiang
Yinsong Liu
Bo Ren
130
9
0
11 Jul 2022
Bi-VLDoc: Bidirectional Vision-Language Modeling for Visually-Rich Document Understanding
International Journal on Document Analysis and Recognition (IJDAR), 2022
Chuwei Luo
Guozhi Tang
Qi Zheng
Cong Yao
Lianwen Jin
Chenliang Li
Yang Xue
Luo Si
274
22
0
27 Jun 2022
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
Neural Information Processing Systems (NeurIPS), 2022
Linxi Fan
Guanzhi Wang
Yunfan Jiang
Ajay Mandlekar
Yuncong Yang
Haoyi Zhu
Andrew Tang
De-An Huang
Yuke Zhu
Anima Anandkumar
LM&Ro
520
496
0
17 Jun 2022
MixGen: A New Multi-Modal Data Augmentation
Xiaoshuai Hao
Yi Zhu
Srikar Appalaraju
Aston Zhang
Wanqian Zhang
Boyang Li
Mu Li
VLM
399
122
0
16 Jun 2022
Test-Time Adaptation for Visual Document Understanding
Sayna Ebrahimi
Sercan O. Arik
Tomas Pfister
OOD
229
6
0
15 Jun 2022
RDU: A Region-based Approach to Form-style Document Understanding
Fengbin Zhu
Chao Wang
Wenqiang Lei
Ziyang Liu
Tat-Seng Chua
161
2
0
14 Jun 2022
Multimodal Learning with Transformers: A Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Peng Xu
Xiatian Zhu
David Clifton
ViT
569
846
0
13 Jun 2022
VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification
Pattern Recognition (Pattern Recogn.), 2022
Souhail Bakkali
Zuheng Ming
Mickael Coustaty
Marccal Rusinol
O. R. Terrades
VLM
281
36
0
24 May 2022
MATrIX -- Modality-Aware Transformer for Information eXtraction
Thomas Delteil
Edouard Belval
Lei Chen
Luis Goncalves
Vijay Mahadevan
209
3
0
17 May 2022
Relational Representation Learning in Visually-Rich Documents
ACM Multimedia (ACM MM), 2022
Xin Li
Yan Zheng
Yiqing Hu
H. Cao
Yunfei Wu
Deqiang Jiang
Yinsong Liu
Bo Ren
256
15
0
05 May 2022
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
ACM Multimedia (ACM MM), 2022
Yupan Huang
Tengchao Lv
Lei Cui
Yutong Lu
Furu Wei
680
645
0
18 Apr 2022
End-to-end Document Recognition and Understanding with Dessurt
Brian L. Davis
B. Morse
Brian L. Price
Chris Tensmeyer
Curtis Wigington
Vlad I. Morariu
VLM
ViT
420
85
0
30 Mar 2022
Towards End-to-End Unified Scene Text Detection and Layout Analysis
Computer Vision and Pattern Recognition (CVPR), 2022
Shangbang Long
Siyang Qin
Dmitry Panteleev
Alessandro Bissacco
Yasuhisa Fujii
Michalis Raptis
269
114
0
28 Mar 2022
Multimodal Pre-training Based on Graph Attention Network for Document Understanding
IEEE transactions on multimedia (IEEE TMM), 2022
Zhenrong Zhang
Jiefeng Ma
Jun Du
Licheng Wang
Jianshu Zhang
210
48
0
25 Mar 2022
FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Chen-Yu Lee
Chun-Liang Li
Timothy Dozat
Vincent Perot
Guolong Su
Nan Hua
Joshua Ainslie
Renshen Wang
Yasuhisa Fujii
Tomas Pfister
209
88
0
16 Mar 2022
XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding
Computer Vision and Pattern Recognition (CVPR), 2022
Zhangxuan Gu
Changhua Meng
Ke Wang
Jun Lan
Weiqiang Wang
Ming Gu
Liqing Zhang
226
95
0
14 Mar 2022
Image Search with Text Feedback by Additive Attention Compositional Learning
Yuxin Tian
Shawn D. Newsam
K. Boakye
CoGe
153
13
0
08 Mar 2022
DiT: Self-supervised Pre-training for Document Image Transformer
ACM Multimedia (ACM MM), 2022
Junlong Li
Yiheng Xu
Tengchao Lv
Lei Cui
Chaoxi Zhang
Furu Wei
ViT
VLM
399
211
0
04 Mar 2022
LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Jiapeng Wang
Lianwen Jin
Kai Ding
VLM
224
178
0
28 Feb 2022
OCR-IDL: OCR Annotations for Industry Document Library Dataset
Ali Furkan Biten
Rubèn Pérez Tito
Lluís Gómez
Ernest Valveny
Dimosthenis Karatzas
186
43
0
25 Feb 2022
A Dataset for Interactive Vision-Language Navigation with Unknown Command Feasibility
European Conference on Computer Vision (ECCV), 2022
Andrea Burns
Deniz Arsan
Sanjna Agrawal
Ranjitha Kumar
Kate Saenko
Bryan A. Plummer
422
82
0
04 Feb 2022
DocSegTr: An Instance-Level End-to-End Document Image Segmentation Transformer
Sanket Biswas
Ayan Banerjee
Josep Lladós
Umapada Pal
ViT
326
30
0
27 Jan 2022
DocEnTr: An End-to-End Document Image Enhancement Transformer
International Conference on Pattern Recognition (ICPR), 2022
Mohamed Ali Souibgui
Sanket Biswas
Sana Khamekhem Jemni
Yousri Kessentini
Alicia Fornés
Josep Lladós
Umapada Pal
ViT
234
57
0
25 Jan 2022
Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream Tasks
International Joint Conference on Artificial Intelligence (IJCAI), 2022
Haoyu Dong
Zhoujun Cheng
Xinyi He
Mengyuan Zhou
Anda Zhou
Fan Zhou
Ao Liu
Shi Han
Dongmei Zhang
LMTD
419
74
0
24 Jan 2022
LaTr: Layout-Aware Transformer for Scene-Text VQA
Computer Vision and Pattern Recognition (CVPR), 2021
Ali Furkan Biten
Ron Litman
Yusheng Xie
Srikar Appalaraju
R. Manmatha
ViT
380
117
0
23 Dec 2021
Value Retrieval with Arbitrary Queries for Form-like Documents
M. Gao
Le Xue
Chetan Ramaiah
Chen Xing
Ran Xu
Caiming Xiong
272
6
0
15 Dec 2021
Previous
1
2
3
4
5
Next
Page 4 of 5