v1v2 (latest)

DocFormer: End-to-End Transformer for Document Understanding

IEEE International Conference on Computer Vision (ICCV), 2021

22 June 2021

Bhargava Urala Kota

Papers citing "DocFormer: End-to-End Transformer for Document Understanding"

50 / 205 papers shown

CLIPTER: Looking at the Bigger Picture in Scene Text RecognitionIEEE International Conference on Computer Vision (ICCV), 2023

328

18 Jan 2023

Towards Models that Can See and ReadIEEE International Conference on Computer Vision (ICCV), 2023

294

18 Jan 2023

An Augmentation Strategy for Visually Rich Documents

234

20 Dec 2022

Wukong-Reader: Multi-modal Pre-training for Fine-grained Visual Document UnderstandingAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

...

Xin Jiang

Qun Liu

ViT

230

19 Dec 2022

CLIPPO: Image-and-Language Understanding from Pixels OnlyComputer Vision and Pattern Recognition (CVPR), 2022

343

15 Dec 2022

Unifying Vision, Text, and Layout for Universal Document ProcessingComputer Vision and Pattern Recognition (CVPR), 2022

Yang Liu

346

153

05 Dec 2022

MGDoc: Pre-training with Multi-granular Hierarchy for Document Image UnderstandingConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Jiuxiang Gu

165

27 Nov 2022

Alignment-Enriched Tuning for Patch-Level Pre-trained Document Image ModelsAAAI Conference on Artificial Intelligence (AAAI), 2022

Lei Wang

210

27 Nov 2022

YORO -- Lightweight End to End Visual Grounding

173

15 Nov 2022

VRDU: A Benchmark for Visually-rich Document UnderstandingKnowledge Discovery and Data Mining (KDD), 2022

Chen-Yu Lee

168

15 Nov 2022

QueryForm: A Simple Zero-shot Form Entity Query FrameworkAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Chen-Yu Lee

Jennifer Dy

Tomas Pfister

128

14 Nov 2022

FormLM: Recommending Creation Ideas for Online Forms by Modelling Semantic and Structural InformationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

259

10 Nov 2022

DoSA : A System to Accelerate Annotations on Business Documents with Human-in-the-Loop

09 Nov 2022

Evaluating Out-of-Distribution Performance on Document Image ClassifiersNeural Information Processing Systems (NeurIPS), 2022

290

14 Oct 2022

ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document UnderstandingConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

...

197

101

12 Oct 2022

Pix2Struct: Screenshot Parsing as Pretraining for Visual Language UnderstandingInternational Conference on Machine Learning (ICML), 2022

Julian Martin Eisenschlos

826

374

07 Oct 2022

XDoc: Unified Pre-training for Cross-Format Document UnderstandingConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

271

06 Oct 2022

ERNIE-mmLayout: Multi-grained MultiModal Transformer for Document Understanding

...

Dianhai Yu

181

18 Sep 2022

One-Shot Doc Snippet Detection: Powering Search in Document Beyond TextIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022

177

12 Sep 2022

Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural Networks

Andrea Gemelli

Sanket Biswas

Enrico Civitelli

Josep Lladós

S. Marinai

161

23 Aug 2022

Understanding Long Documents with Different Position-Aware Attentions

167

17 Aug 2022

Knowing Where and What: Unified Word Block Pretraining for Document Understanding

250

28 Jul 2022

Towards Complex Document Understanding By Discrete ReasoningACM Multimedia (ACM MM), 2022

343

25 Jul 2022

Layout-Aware Information Extraction for Document-Grounded Dialogue: Dataset, Method and DemonstrationACM Multimedia (ACM MM), 2022

255

14 Jul 2022

GMN: Generative Multi-modal Network for Practical Document Information ExtractionNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022

130

11 Jul 2022

Bi-VLDoc: Bidirectional Vision-Language Modeling for Visually-Rich Document UnderstandingInternational Journal on Document Analysis and Recognition (IJDAR), 2022

274

27 Jun 2022

MineDojo: Building Open-Ended Embodied Agents with Internet-Scale KnowledgeNeural Information Processing Systems (NeurIPS), 2022

Linxi Fan

De-An Huang

520

496

17 Jun 2022

MixGen: A New Multi-Modal Data Augmentation

399

122

16 Jun 2022

Test-Time Adaptation for Visual Document Understanding

Sayna Ebrahimi

Sercan O. Arik

Tomas Pfister

OOD

229

15 Jun 2022

RDU: A Region-based Approach to Form-style Document Understanding

161

14 Jun 2022

Multimodal Learning with Transformers: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022

569

846

13 Jun 2022

VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document ClassificationPattern Recognition (Pattern Recogn.), 2022

281

24 May 2022

MATrIX -- Modality-Aware Transformer for Information eXtraction

209

17 May 2022

Relational Representation Learning in Visually-Rich DocumentsACM Multimedia (ACM MM), 2022

Xin Li

256

05 May 2022

LayoutLMv3: Pre-training for Document AI with Unified Text and Image MaskingACM Multimedia (ACM MM), 2022

680

645

18 Apr 2022

End-to-end Document Recognition and Understanding with Dessurt

420

30 Mar 2022

Towards End-to-End Unified Scene Text Detection and Layout AnalysisComputer Vision and Pattern Recognition (CVPR), 2022

Yasuhisa Fujii

269

114

28 Mar 2022

Multimodal Pre-training Based on Graph Attention Network for Document UnderstandingIEEE transactions on multimedia (IEEE TMM), 2022

Jun Du

210

25 Mar 2022

FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information ExtractionAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Chen-Yu Lee

Chun-Liang Li

Joshua Ainslie

Yasuhisa Fujii

Tomas Pfister

209

16 Mar 2022

XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document UnderstandingComputer Vision and Pattern Recognition (CVPR), 2022

226

14 Mar 2022

Image Search with Text Feedback by Additive Attention Compositional Learning

153

08 Mar 2022

DiT: Self-supervised Pre-training for Document Image TransformerACM Multimedia (ACM MM), 2022

399

211

04 Mar 2022

LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document UnderstandingAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Jiapeng Wang

Lianwen Jin

Kai Ding

VLM

224

178

28 Feb 2022

OCR-IDL: OCR Annotations for Industry Document Library Dataset

186

25 Feb 2022

A Dataset for Interactive Vision-Language Navigation with Unknown Command FeasibilityEuropean Conference on Computer Vision (ECCV), 2022

422

04 Feb 2022

DocSegTr: An Instance-Level End-to-End Document Image Segmentation Transformer

Sanket Biswas

Ayan Banerjee

Josep Lladós

Umapada Pal

ViT

326

27 Jan 2022

DocEnTr: An End-to-End Document Image Enhancement TransformerInternational Conference on Pattern Recognition (ICPR), 2022

Sanket Biswas

Josep Lladós

234

25 Jan 2022

Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream TasksInternational Joint Conference on Artificial Intelligence (IJCAI), 2022

419

24 Jan 2022

LaTr: Layout-Aware Transformer for Scene-Text VQAComputer Vision and Pattern Recognition (CVPR), 2021

380

117

23 Dec 2021

Value Retrieval with Arbitrary Queries for Form-like Documents

Ran Xu

272

15 Dec 2021