Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2106.11539
Cited By

DocFormer: End-to-End Transformer for Document Understanding

v1v2 (latest)

DocFormer: End-to-End Transformer for Document Understanding

IEEE International Conference on Computer Vision (ICCV), 2021

22 June 2021

Srikar Appalaraju

Bhavan A. Jasani

Bhargava Urala Kota

ArXiv (abs)PDF HTML

Papers citing "DocFormer: End-to-End Transformer for Document Understanding"

50 / 205 papers shown

GenKIE: Robust Generative Multimodal Document Key Information Extraction

GenKIE: Robust Generative Multimodal Document Key Information ExtractionConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

154

9

0

24 Oct 2023

Vision-Enhanced Semantic Entity Recognition in Document Images via
Visually-Asymmetric Consistency Learning

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency LearningConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

196

1

0

23 Oct 2023

PHD: Pixel-Based Language Modeling of Historical Documents

PHD: Pixel-Based Language Modeling of Historical DocumentsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Nadav Borenstein

Desmond Elliott

Isabelle Augenstein

275

6

0

22 Oct 2023

DSG: An End-to-End Document Structure Generator

DSG: An End-to-End Document Structure Generator

Johannes Rausch

Gentiana Rashiti

Stefan Feuerriegel

250

4

0

13 Oct 2023

ProtoNER: Few shot Incremental Learning for Named Entity Recognition
using Prototypical Networks

ProtoNER: Few shot Incremental Learning for Named Entity Recognition using Prototypical Networks

Vatche Isahagian

196

5

0

03 Oct 2023

Analyzing the Efficacy of an LLM-Only Approach for Image-based Document
Question Answering

Analyzing the Efficacy of an LLM-Only Approach for Image-based Document Question Answering

Gaurav Aggarwal

220

9

0

25 Sep 2023

Document Understanding for Healthcare Referrals

Document Understanding for Healthcare ReferralsIEEE International Conference on Healthcare Informatics (ICHI), 2023

95

1

0

22 Sep 2023

Kosmos-2.5: A Multimodal Literate Model

Kosmos-2.5: A Multimodal Literate Model

...

260

89

0

20 Sep 2023

LMDX: Language Model-based Document Information Extraction and
Localization

LMDX: Language Model-based Document Information Extraction and LocalizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Florian Luisier

...

Chen-Yu Lee

227

52

0

19 Sep 2023

Vision Grid Transformer for Document Layout Analysis

Vision Grid Transformer for Document Layout AnalysisIEEE International Conference on Computer Vision (ICCV), 2023

237

52

0

29 Aug 2023

Nougat: Neural Optical Understanding for Academic Documents

Nougat: Neural Optical Understanding for Academic DocumentsInternational Conference on Learning Representations (ICLR), 2023

Guillem Cucurull

203

178

0

25 Aug 2023

Beyond Document Page Classification: Design, Datasets, and Challenges

Beyond Document Page Classification: Design, Datasets, and ChallengesIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

Jordy Van Landeghem

Sanket Biswas

Matthew B. Blaschko

Marie-Francine Moens

212

9

0

24 Aug 2023

Enhancing Visually-Rich Document Understanding via Layout Structure
Modeling

Enhancing Visually-Rich Document Understanding via Layout Structure ModelingACM Multimedia (ACM MM), 2023

Bo Du

147

11

0

15 Aug 2023

RealCQA: Scientific Chart Question Answering as a Test-bed for
First-Order Logic

RealCQA: Scientific Chart Question Answering as a Test-bed for First-Order LogicIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023

Venugopal Govindaraju

154

7

0

03 Aug 2023

SpaDen : Sparse and Dense Keypoint Estimation for Real-World Chart
Understanding

SpaDen : Sparse and Dense Keypoint Estimation for Real-World Chart UnderstandingIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023

Venugopal Govindaraju

108

2

0

03 Aug 2023

A Real-World WebAgent with Planning, Long Context Understanding, and
Program Synthesis

A Real-World WebAgent with Planning, Long Context Understanding, and Program SynthesisInternational Conference on Learning Representations (ICLR), 2023

Hiroki Furuta

Mustafa Safdari

Aleksandra Faust

567

315

0

24 Jul 2023

DocTr: Document Transformer for Structured Information Extraction in
Documents

DocTr: Document Transformer for Structured Information Extraction in DocumentsIEEE International Conference on Computer Vision (ICCV), 2023

Aruni RoyChowdhury

Vijay Mahadevan

196

22

0

16 Jul 2023

On Evaluation of Document Classification using RVL-CDIP

On Evaluation of Document Classification using RVL-CDIP

261

4

0

21 Jun 2023

DocumentNet: Bridging the Data Gap in Document Pre-Training

DocumentNet: Bridging the Data Gap in Document Pre-TrainingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Alexander G. Hauptmann

97

3

0

15 Jun 2023

DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents

DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents

Fuxiao Liu

Chris Tensmeyer

273

18

0

09 Jun 2023

ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich
Document Images

ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich Document ImagesIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023

Chengquan Zhang

...

Dimosthenis Karatzas

Jingdong Wang

197

18

0

05 Jun 2023

DocFormerv2: Local Features for Document Understanding

DocFormerv2: Local Features for Document UnderstandingAAAI Conference on Artificial Intelligence (AAAI), 2023

Srikar Appalaraju

Nishant Sankaran

247

57

0

02 Jun 2023

End-to-End Document Classification and Key Information Extraction using
Assignment Optimization

End-to-End Document Classification and Key Information Extraction using Assignment Optimization

Mairead O'Cuinn

181

1

0

01 Jun 2023

Layout and Task Aware Instruction Prompt for Zero-shot Document Image
Question Answering

Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering

403

34

0

01 Jun 2023

LayoutMask: Enhance Text-Layout Interaction in Multi-modal Pre-training
for Document Understanding

LayoutMask: Enhance Text-Layout Interaction in Multi-modal Pre-training for Document UnderstandingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

202

22

0

30 May 2023

Benchmarking Diverse-Modal Entity Linking with Generative Models

Benchmarking Diverse-Modal Entity Linking with Generative ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Alexander Hanbo Li

...

Vittorio Castelli

285

12

0

27 May 2023

Visually-Situated Natural Language Understanding with Contrastive
Reading Model and Frozen Large Language Models

Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

345

4

0

24 May 2023

Towards Few-shot Entity Recognition in Document Images: A Graph Neural
Network Approach Robust to Image Manipulation

Towards Few-shot Entity Recognition in Document Images: A Graph Neural Network Approach Robust to Image ManipulationInternational Conference on Language Resources and Evaluation (LREC), 2023

Prashant Krishnan

253

3

0

24 May 2023

DUBLIN -- Document Understanding By Language-Image Network

DUBLIN -- Document Understanding By Language-Image Network

Aditi Khandelwal

Owais Mohammed Khan

Monojit Choudhury

Hardik Hansrajbhai Chauhan

Vishrav Chaudhary

310

0

0

23 May 2023

Global Structure Knowledge-Guided Relation Extraction Method for
Visually-Rich Document

Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich DocumentConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Xiaozhong Liu

211

6

0

23 May 2023

Multimodal Web Navigation with Instruction-Finetuned Foundation Models

Multimodal Web Navigation with Instruction-Finetuned Foundation ModelsInternational Conference on Learning Representations (ICLR), 2023

Hiroki Furuta

Aleksandra Faust

413

142

0

19 May 2023

Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided
Dynamic Token Merge for Document Understanding

Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided Dynamic Token Merge for Document UnderstandingInternational Joint Conference on Artificial Intelligence (IJCAI), 2023

Chengquan Zhang

144

8

0

19 May 2023

Sequence-to-Sequence Pre-training with Unified Modality Masking for
Visual Document Understanding

Sequence-to-Sequence Pre-training with Unified Modality Masking for Visual Document Understanding

Trung Quoc Luong

105

2

0

16 May 2023

Document Understanding Dataset and Evaluation (DUDE)

Document Understanding Dataset and Evaluation (DUDE)IEEE International Conference on Computer Vision (ICCV), 2023

Jordy Van Landeghem

Rubèn Pérez Tito

Łukasz Borchmann

Michal Pietruszka

...

Bertrand Ackaert

Matthew Blaschko

Tomasz Stanislawek

298

109

0

15 May 2023

SwinDocSegmenter: An End-to-End Unified Domain Adaptive Transformer for
Document Instance Segmentation

SwinDocSegmenter: An End-to-End Unified Domain Adaptive Transformer for Document Instance SegmentationIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023

Sanket Biswas

Josep Lladós

189

21

0

08 May 2023

Text Reading Order in Uncontrolled Conditions by Sparse Graph
Segmentation

Text Reading Order in Uncontrolled Conditions by Sparse Graph SegmentationIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023

Yasuhisa Fujii

Alessandro Bissacco

132

7

0

04 May 2023

FormNetV2: Multimodal Graph Contrastive Learning for Form Document
Information Extraction

FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information ExtractionAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Chun-Liang Li

...

Yasuhisa Fujii

194

21

0

04 May 2023

Doc2SoarGraph: Discrete Reasoning over Visually-Rich Table-Text
Documents via Semantic-Oriented Hierarchical Graphs

Doc2SoarGraph: Discrete Reasoning over Visually-Rich Table-Text Documents via Semantic-Oriented Hierarchical GraphsInternational Conference on Language Resources and Evaluation (LREC), 2023

217

7

0

03 May 2023

SelfDocSeg: A Self-Supervised vision-based Approach towards Document
Segmentation

SelfDocSeg: A Self-Supervised vision-based Approach towards Document SegmentationIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023

Sanket Biswas

Siladittya Manna

Josep Lladós

Saumik Bhattacharya

176

10

0

01 May 2023

Information Redundancy and Biases in Public Document Information
Extraction Benchmarks

Information Redundancy and Biases in Public Document Information Extraction BenchmarksIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023

Pirashanth Ratnamogan

William Vanhuffel

144

2

0

28 Apr 2023

Evaluating Adversarial Robustness on Document Image Classification

Evaluating Adversarial Robustness on Document Image ClassificationIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023

Timothée Fronteau

245

3

0

24 Apr 2023

GeoLayoutLM: Geometric Pre-training for Visual Information Extraction

GeoLayoutLM: Geometric Pre-training for Visual Information ExtractionComputer Vision and Pattern Recognition (CVPR), 2023

256

62

0

21 Apr 2023

CAVL: Learning Contrastive and Adaptive Representations of Vision and
Language

CAVL: Learning Contrastive and Adaptive Representations of Vision and Language

Shentong Mo

199

1

0

10 Apr 2023

Context-Aware Classification of Legal Document Pages

Context-Aware Classification of Legal Document PagesAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2023

Pavlos Fragkogiannis

Martina Forster

148

6

0

05 Apr 2023

ChartReader: A Unified Framework for Chart Derendering and Comprehension
without Heuristic Rules

ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic RulesIEEE International Conference on Computer Vision (ICCV), 2023

Teruko Mitamura

Alexander G. Hauptmann

215

28

0

05 Apr 2023

Modeling Entities as Semantic Points for Visual Information Extraction
in the Wild

Modeling Entities as Semantic Points for Visual Information Extraction in the WildComputer Vision and Pattern Recognition (CVPR), 2023

Humen Zhong

172

28

0

23 Mar 2023

ST-KeyS: Self-Supervised Transformer for Keyword Spotting in Historical
Handwritten Documents

ST-KeyS: Self-Supervised Transformer for Keyword Spotting in Historical Handwritten DocumentsPattern Recognition (Pattern Recogn.), 2023

Sana Khamekhem Jemni

Mohamed Ali Souibgui

Yousri Kessentini

254

6

0

06 Mar 2023

StrucTexTv2: Masked Visual-Textual Prediction for Document Image
Pre-training

StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-trainingInternational Conference on Learning Representations (ICLR), 2023

Chengquan Zhang

Xiaoqiang Zhang

Errui Ding

Jingdong Wang

185

53

0

01 Mar 2023

DocILE Benchmark for Document Information Localization and Extraction

DocILE Benchmark for Document Information Localization and ExtractionIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023

vStvepán vSimsa

Michal Uvrivcávr

...

Matyávs Skalický

Antoine Doucet

Mickael Coustaty

Dimosthenis Karatzas

186

48

0

11 Feb 2023

LoRaLay: A Multilingual and Multimodal Dataset for Long Range and
Layout-Aware Summarization

LoRaLay: A Multilingual and Multimodal Dataset for Long Range and Layout-Aware SummarizationConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023

Benjamin Piwowarski

188

14

0

26 Jan 2023