ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.11539
  4. Cited By
DocFormer: End-to-End Transformer for Document Understanding
v1v2 (latest)

DocFormer: End-to-End Transformer for Document Understanding

IEEE International Conference on Computer Vision (ICCV), 2021
22 June 2021
Srikar Appalaraju
Bhavan A. Jasani
Bhargava Urala Kota
Yusheng Xie
R. Manmatha
    ViT
ArXiv (abs)PDFHTML

Papers citing "DocFormer: End-to-End Transformer for Document Understanding"

50 / 205 papers shown
GenKIE: Robust Generative Multimodal Document Key Information Extraction
GenKIE: Robust Generative Multimodal Document Key Information ExtractionConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Panfeng Cao
Ye Wang
Qiang Zhang
Zaiqiao Meng
SyDa
154
9
0
24 Oct 2023
Vision-Enhanced Semantic Entity Recognition in Document Images via
  Visually-Asymmetric Consistency Learning
Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency LearningConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Hao Wang
Xiahua Chen
Rui Wang
Chenhui Chu
196
1
0
23 Oct 2023
PHD: Pixel-Based Language Modeling of Historical Documents
PHD: Pixel-Based Language Modeling of Historical DocumentsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Nadav Borenstein
Phillip Rust
Desmond Elliott
Isabelle Augenstein
275
6
0
22 Oct 2023
DSG: An End-to-End Document Structure Generator
DSG: An End-to-End Document Structure Generator
Johannes Rausch
Gentiana Rashiti
Maxim Gusev
Ce Zhang
Stefan Feuerriegel
250
4
0
13 Oct 2023
ProtoNER: Few shot Incremental Learning for Named Entity Recognition
  using Prototypical Networks
ProtoNER: Few shot Incremental Learning for Named Entity Recognition using Prototypical Networks
Ritesh Kumar
Saurabh Goyal
Ashish Verma
Vatche Isahagian
196
5
0
03 Oct 2023
Analyzing the Efficacy of an LLM-Only Approach for Image-based Document
  Question Answering
Analyzing the Efficacy of an LLM-Only Approach for Image-based Document Question Answering
Nidhi Hegde
S. Paul
Gagan Madan
Gaurav Aggarwal
220
9
0
25 Sep 2023
Document Understanding for Healthcare Referrals
Document Understanding for Healthcare ReferralsIEEE International Conference on Healthcare Informatics (ICHI), 2023
Jimit Mistry
N. Arzeno
MedIm
95
1
0
22 Sep 2023
Kosmos-2.5: A Multimodal Literate Model
Kosmos-2.5: A Multimodal Literate Model
Tengchao Lv
Yupan Huang
Jingye Chen
Lei Cui
Shuming Ma
...
Weiyao Luo
Shaoxiang Wu
Guoxin Wang
Cha Zhang
Furu Wei
VLMMLLM
260
89
0
20 Sep 2023
LMDX: Language Model-based Document Information Extraction and
  Localization
LMDX: Language Model-based Document Information Extraction and LocalizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Vincent Perot
Kai Kang
Florian Luisier
Guolong Su
Xiaoyu Sun
...
Zifeng Wang
Jiaqi Mu
Hao Zhang
Chen-Yu Lee
Nan Hua
227
52
0
19 Sep 2023
Vision Grid Transformer for Document Layout Analysis
Vision Grid Transformer for Document Layout AnalysisIEEE International Conference on Computer Vision (ICCV), 2023
Cheng Da
Chuwei Luo
Qi Zheng
Cong Yao
ViT
237
52
0
29 Aug 2023
Nougat: Neural Optical Understanding for Academic Documents
Nougat: Neural Optical Understanding for Academic DocumentsInternational Conference on Learning Representations (ICLR), 2023
Lukas Blecher
Guillem Cucurull
Thomas Scialom
Robert Stojnic
ViT
203
178
0
25 Aug 2023
Beyond Document Page Classification: Design, Datasets, and Challenges
Beyond Document Page Classification: Design, Datasets, and ChallengesIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Jordy Van Landeghem
Sanket Biswas
Matthew B. Blaschko
Marie-Francine Moens
212
9
0
24 Aug 2023
Enhancing Visually-Rich Document Understanding via Layout Structure
  Modeling
Enhancing Visually-Rich Document Understanding via Layout Structure ModelingACM Multimedia (ACM MM), 2023
Qiwei Li
Z. Li
Xiantao Cai
Bo Du
Hai Zhao
147
11
0
15 Aug 2023
RealCQA: Scientific Chart Question Answering as a Test-bed for
  First-Order Logic
RealCQA: Scientific Chart Question Answering as a Test-bed for First-Order LogicIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023
Saleem Ahmed
Bhavin Jawade
Shubham Pandey
S. Setlur
Venugopal Govindaraju
154
7
0
03 Aug 2023
SpaDen : Sparse and Dense Keypoint Estimation for Real-World Chart
  Understanding
SpaDen : Sparse and Dense Keypoint Estimation for Real-World Chart UnderstandingIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023
Saleem Ahmed
Pengyu Yan
David Doermann
S. Setlur
Venugopal Govindaraju
108
2
0
03 Aug 2023
A Real-World WebAgent with Planning, Long Context Understanding, and
  Program Synthesis
A Real-World WebAgent with Planning, Long Context Understanding, and Program SynthesisInternational Conference on Learning Representations (ICLR), 2023
Izzeddin Gur
Hiroki Furuta
Austin Huang
Mustafa Safdari
Yutaka Matsuo
Douglas Eck
Aleksandra Faust
LM&RoLLMAG
567
315
0
24 Jul 2023
DocTr: Document Transformer for Structured Information Extraction in
  Documents
DocTr: Document Transformer for Structured Information Extraction in DocumentsIEEE International Conference on Computer Vision (ICCV), 2023
Haofu Liao
Aruni RoyChowdhury
Weijian Li
Ankan Bansal
Yuting Zhang
Zhuowen Tu
R. Satzoda
R. Manmatha
Vijay Mahadevan
196
22
0
16 Jul 2023
On Evaluation of Document Classification using RVL-CDIP
On Evaluation of Document Classification using RVL-CDIP
Stefan Larson
Gordon Lim
Kevin Leach
261
4
0
21 Jun 2023
DocumentNet: Bridging the Data Gap in Document Pre-Training
DocumentNet: Bridging the Data Gap in Document Pre-TrainingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Lijun Yu
Jin Miao
Xiaoyu Sun
Jiayi Chen
Alexander G. Hauptmann
H. Dai
Wei Wei
97
3
0
15 Jun 2023
DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents
DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents
Fuxiao Liu
Hao Tan
Chris Tensmeyer
CLIPVLM
273
18
0
09 Jun 2023
ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich
  Document Images
ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich Document ImagesIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023
Wenwen Yu
Chengquan Zhang
H. Cao
Wei Hua
Bohan Li
...
Hao Fei
Dimosthenis Karatzas
Xingchao Sun
Jingdong Wang
Xiang Bai
197
18
0
05 Jun 2023
DocFormerv2: Local Features for Document Understanding
DocFormerv2: Local Features for Document UnderstandingAAAI Conference on Artificial Intelligence (AAAI), 2023
Srikar Appalaraju
Peng Tang
Qi Dong
Nishant Sankaran
Yichu Zhou
R. Manmatha
247
57
0
02 Jun 2023
End-to-End Document Classification and Key Information Extraction using
  Assignment Optimization
End-to-End Document Classification and Key Information Extraction using Assignment Optimization
Ciaran Cooney
Joana Cavadas
Liam Madigan
Bradley Savage
Rachel Heyburn
Mairead O'Cuinn
181
1
0
01 Jun 2023
Layout and Task Aware Instruction Prompt for Zero-shot Document Image
  Question Answering
Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering
Wenjin Wang
Yunhao Li
Yixin Ou
Yin Zhang
VLM
403
34
0
01 Jun 2023
LayoutMask: Enhance Text-Layout Interaction in Multi-modal Pre-training
  for Document Understanding
LayoutMask: Enhance Text-Layout Interaction in Multi-modal Pre-training for Document UnderstandingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Yi Tu
Ya Guo
Huan Chen
Jinyang Tang
202
22
0
30 May 2023
Benchmarking Diverse-Modal Entity Linking with Generative Models
Benchmarking Diverse-Modal Entity Linking with Generative ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Sijia Wang
Alexander Hanbo Li
He Zhu
Shenmin Zhang
Chung-Wei Hang
...
William Wang
Zhiguo Wang
Vittorio Castelli
Bing Xiang
Patrick Ng
VLM
285
12
0
27 May 2023
Visually-Situated Natural Language Understanding with Contrastive
  Reading Model and Frozen Large Language Models
Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Geewook Kim
Hodong Lee
D. Kim
Haeji Jung
S. Park
Yoon Kim
Sangdoo Yun
Taeho Kil
Bado Lee
Seunghyun Park
VLM
345
4
0
24 May 2023
Towards Few-shot Entity Recognition in Document Images: A Graph Neural
  Network Approach Robust to Image Manipulation
Towards Few-shot Entity Recognition in Document Images: A Graph Neural Network Approach Robust to Image ManipulationInternational Conference on Language Resources and Evaluation (LREC), 2023
Prashant Krishnan
Zilong Wang
Yangkun Wang
Jingbo Shang
253
3
0
24 May 2023
DUBLIN -- Document Understanding By Language-Image Network
DUBLIN -- Document Understanding By Language-Image Network
Kriti Aggarwal
Aditi Khandelwal
Kumar Tanmay
Owais Mohammed Khan
Qiang Liu
Monojit Choudhury
Hardik Hansrajbhai Chauhan
Subhojit Som
Vishrav Chaudhary
Saurabh Tiwary
ObjDVLM
310
0
0
23 May 2023
Global Structure Knowledge-Guided Relation Extraction Method for
  Visually-Rich Document
Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich DocumentConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Xiangnan Chen
Qianwen Xiao
Juncheng Li
Duo Dong
Jun Lin
Xiaozhong Liu
Siliang Tang
211
6
0
23 May 2023
Multimodal Web Navigation with Instruction-Finetuned Foundation Models
Multimodal Web Navigation with Instruction-Finetuned Foundation ModelsInternational Conference on Learning Representations (ICLR), 2023
Hiroki Furuta
Kuang-Huei Lee
Ofir Nachum
Yutaka Matsuo
Aleksandra Faust
S. Gu
Izzeddin Gur
LM&Ro
413
142
0
19 May 2023
Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided
  Dynamic Token Merge for Document Understanding
Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided Dynamic Token Merge for Document UnderstandingInternational Joint Conference on Artificial Intelligence (IJCAI), 2023
Mingliang Zhai
Yulin Li
Xiameng Qin
Chen Yi
Qunyi Xie
Chengquan Zhang
Kun Yao
Yuwei Wu
Yunde Jia
144
8
0
19 May 2023
Sequence-to-Sequence Pre-training with Unified Modality Masking for
  Visual Document Understanding
Sequence-to-Sequence Pre-training with Unified Modality Masking for Visual Document Understanding
ShuWei Feng
Tianyang Zhan
Zhanming Jie
Trung Quoc Luong
Xiaoran Jin
105
2
0
16 May 2023
Document Understanding Dataset and Evaluation (DUDE)
Document Understanding Dataset and Evaluation (DUDE)IEEE International Conference on Computer Vision (ICCV), 2023
Jordy Van Landeghem
Rubèn Pérez Tito
Łukasz Borchmann
Michal Pietruszka
Pawel Józiak
...
Bertrand Ackaert
Ernest Valveny
Matthew Blaschko
Sien Moens
Tomasz Stanislawek
VGen
298
109
0
15 May 2023
SwinDocSegmenter: An End-to-End Unified Domain Adaptive Transformer for
  Document Instance Segmentation
SwinDocSegmenter: An End-to-End Unified Domain Adaptive Transformer for Document Instance SegmentationIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023
Ayan Banerjee
Sanket Biswas
Josep Lladós
Umapada Pal
ViT
189
21
0
08 May 2023
Text Reading Order in Uncontrolled Conditions by Sparse Graph
  Segmentation
Text Reading Order in Uncontrolled Conditions by Sparse Graph SegmentationIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023
Renshen Wang
Yasuhisa Fujii
Alessandro Bissacco
GNN
132
7
0
04 May 2023
FormNetV2: Multimodal Graph Contrastive Learning for Form Document
  Information Extraction
FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information ExtractionAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Nils Loose
Chun-Liang Li
Hao Zhang
Timothy Dozat
Felix Mächtle
...
Shangbang Long
Siyang Qin
Yasuhisa Fujii
Nan Hua
T. Eisenbarth
SSL
194
21
0
04 May 2023
Doc2SoarGraph: Discrete Reasoning over Visually-Rich Table-Text
  Documents via Semantic-Oriented Hierarchical Graphs
Doc2SoarGraph: Discrete Reasoning over Visually-Rich Table-Text Documents via Semantic-Oriented Hierarchical GraphsInternational Conference on Language Resources and Evaluation (LREC), 2023
Fengbin Zhu
Chao Wang
Fuli Feng
Zifeng Ren
Moxin Li
Tat-Seng Chua
217
7
0
03 May 2023
SelfDocSeg: A Self-Supervised vision-based Approach towards Document
  Segmentation
SelfDocSeg: A Self-Supervised vision-based Approach towards Document SegmentationIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023
Subhajit Maity
Sanket Biswas
Siladittya Manna
Ayan Banerjee
Josep Lladós
Saumik Bhattacharya
Umapada Pal
176
10
0
01 May 2023
Information Redundancy and Biases in Public Document Information
  Extraction Benchmarks
Information Redundancy and Biases in Public Document Information Extraction BenchmarksIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023
S. Laatiri
Pirashanth Ratnamogan
Joel Tang
Laurent Lam
William Vanhuffel
Fabien Caspani
144
2
0
28 Apr 2023
Evaluating Adversarial Robustness on Document Image Classification
Evaluating Adversarial Robustness on Document Image ClassificationIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023
Timothée Fronteau
Arnaud Paran
A. Shabou
AAML
245
3
0
24 Apr 2023
GeoLayoutLM: Geometric Pre-training for Visual Information Extraction
GeoLayoutLM: Geometric Pre-training for Visual Information ExtractionComputer Vision and Pattern Recognition (CVPR), 2023
Chuwei Luo
Changxu Cheng
Qi Zheng
Cong Yao
256
62
0
21 Apr 2023
CAVL: Learning Contrastive and Adaptive Representations of Vision and
  Language
CAVL: Learning Contrastive and Adaptive Representations of Vision and Language
Shentong Mo
Jingfei Xia
Ihor Markevych
CLIPVLM
199
1
0
10 Apr 2023
Context-Aware Classification of Legal Document Pages
Context-Aware Classification of Legal Document PagesAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2023
Pavlos Fragkogiannis
Martina Forster
Grace E. Lee
Dell Zhang
148
6
0
05 Apr 2023
ChartReader: A Unified Framework for Chart Derendering and Comprehension
  without Heuristic Rules
ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic RulesIEEE International Conference on Computer Vision (ICCV), 2023
Zhi-Qi Cheng
Qianwen Dai
Siyao Li
Yuxuan Zhou
Teruko Mitamura
Alexander G. Hauptmann
215
28
0
05 Apr 2023
Modeling Entities as Semantic Points for Visual Information Extraction
  in the Wild
Modeling Entities as Semantic Points for Visual Information Extraction in the WildComputer Vision and Pattern Recognition (CVPR), 2023
Zhibo Yang
Rujiao Long
Pengfei Wang
Sibo Song
Humen Zhong
Wenqing Cheng
X. Bai
Cong Yao
172
28
0
23 Mar 2023
ST-KeyS: Self-Supervised Transformer for Keyword Spotting in Historical
  Handwritten Documents
ST-KeyS: Self-Supervised Transformer for Keyword Spotting in Historical Handwritten DocumentsPattern Recognition (Pattern Recogn.), 2023
Sana Khamekhem Jemni
Sourour Ammar
Mohamed Ali Souibgui
Yousri Kessentini
A. Cheddad
254
6
0
06 Mar 2023
StrucTexTv2: Masked Visual-Textual Prediction for Document Image
  Pre-training
StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-trainingInternational Conference on Learning Representations (ICLR), 2023
Yu Yu
Yulin Li
Chengquan Zhang
Xiaoqiang Zhang
Zengyuan Guo
Xiameng Qin
Kun Yao
Junyu Han
Errui Ding
Jingdong Wang
185
53
0
01 Mar 2023
DocILE Benchmark for Document Information Localization and Extraction
DocILE Benchmark for Document Information Localization and ExtractionIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023
vStvepán vSimsa
Milan vSulc
Michal Uvrivcávr
Yash J. Patel
Ahmed Hamdi
...
Matyávs Skalický
Jivrí Matas
Antoine Doucet
Mickael Coustaty
Dimosthenis Karatzas
186
48
0
11 Feb 2023
LoRaLay: A Multilingual and Multimodal Dataset for Long Range and
  Layout-Aware Summarization
LoRaLay: A Multilingual and Multimodal Dataset for Long Range and Layout-Aware SummarizationConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Laura Nguyen
Thomas Scialom
Benjamin Piwowarski
Jacopo Staiano
188
14
0
26 Jan 2023
Previous
12345
Next