v1v2 (latest)

DocFormer: End-to-End Transformer for Document Understanding

IEEE International Conference on Computer Vision (ICCV), 2021

22 June 2021

Bhargava Urala Kota

Papers citing "DocFormer: End-to-End Transformer for Document Understanding"

50 / 205 papers shown

ARIAL: An Agentic Framework for Document VQA with Precise Answer Localization

Ahmad Mohammadshirazi

Pinaki Prasad Guha Neogi

Dheeraj Kulshrestha

R. Ramnath

117

22 Nov 2025

TabRAG: Tabular Document Retrieval via Structured Language Representations

277

10 Nov 2025

ColMate: Contrastive Late Interaction and Masked Text for Multimodal Document Retrieval

Ahmed Masry

Megh Thakkar

Patrice Bechard

Sathwik Tejaswi Madhusudhan

...

189

02 Nov 2025

Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding

263

17 Oct 2025

Invoice Information Extraction: Methods and Performance Evaluation

17 Oct 2025

Hybrid OCR-LLM Framework for Enterprise-Scale Document Information Extraction Under Copy-heavy Task

Zilong Wang

Xiaoyu Shen

11 Oct 2025

LLM/Agent-as-Data-Analyst: A Survey

...

230

28 Sep 2025

OTCR: Optimal Transmission, Compression and Representation for Multimodal Information Extraction

17 Sep 2025

Vector embedding of multi-modal texts: a tool for discovery?

Beth Plale

Sai Navya Jyesta

S. Withana

10 Sep 2025

Enhancing Document VQA Models via Retrieval-Augmented Generation

215

26 Aug 2025

Seeing Like a Designer Without One: A Study on Unsupervised Slide Quality Assessment via Designer Cue Augmentation

Tai Inui

Steven Oh

Magdeline Kuan

25 Aug 2025

Zero-shot Multimodal Document Retrieval via Cross-modal Question Generation

112

23 Aug 2025

From Surface to Semantics: Semantic Structure Parsing for Table-Centric Document Analysis

107

14 Aug 2025

Zero-Shot Document Understanding using Pseudo Table of Contents-Guided Retrieval-Augmented Generation

133

31 Jul 2025

Describe Anything Model for Visual Question Answering on Text-rich Images

...

277

16 Jul 2025

DocPolarBERT: A Pre-trained Model for Document Understanding with Relative Polar Coordinate Encoding of Layout Structures

281

11 Jul 2025

From Drawings to Decisions: A Hybrid Vision-Language Framework for Parsing 2D Engineering Drawings into Structured Manufacturing Knowledge

167

20 Jun 2025

Robustness Evaluation of OCR-based Visual Document Understanding under Multi-Modal Adversarial Attacks

Dong Nguyen Tien

Dung D. Le

AAML

214

19 Jun 2025

FormGym: Doing Paperwork with Agents

Matthew Toles

Rattandeep Singh

Isaac Song Zhou Yu

109

17 Jun 2025

SimpleDoc: Multi-Modal Document Understanding with Dual-Cue Page Retrieval and Iterative Refinement

203

16 Jun 2025

Multimodal Tabular Reasoning with Privileged Structured Information

257

04 Jun 2025

Information Extraction from Visually Rich Documents using LLM-based Organization of Documents into Independent Textual SegmentsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

181

18 May 2025

Lost in OCR Translation? Vision-Based Approaches to Robust Document RetrievalACM Symposium on Document Engineering (DocEng), 2025

Alexander Buschmann Most

Joseph Winjum

Ayan Biswas

Shawn Jones

Nishath Rajiv Ranasinghe

Dan O’Malley

Manish Bhattarai

190

08 May 2025

Representation Learning for Tabular Data: A Comprehensive Survey

367

17 Apr 2025

NoTeS-Bank: Benchmarking Neural Transcription and Search for Scientific Notes Understanding

Soumitri Chattopadhyay

198

12 Apr 2025

Preserving Privacy Without Compromising Accuracy: Machine Unlearning for Handwritten Text Recognition

340

11 Apr 2025

SmolVLM: Redefining small and efficient multimodal models

...

451

112

07 Apr 2025

QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-free Visual Document Understanding

327

03 Apr 2025

BiblioPage: A Dataset of Scanned Title Pages for Bibliographic Metadata ExtractionIEEE International Conference on Document Analysis and Recognition (ICDAR), 2025

195

25 Mar 2025

A Simple yet Effective Layout Token in Large Language Models for Document UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025

308

24 Mar 2025

TextBite: A Historical Czech Document Dataset for Logical Page Segmentation

Martin Kostelník

Karel Beneš

Michal Hradiš

172

20 Mar 2025

KIEval: Evaluation Metric for Document Key Information ExtractionIEEE International Conference on Document Analysis and Recognition (ICDAR), 2025

409

07 Mar 2025

LiGT: Layout-infused Generative Transformer for Visual Question Answering on Vietnamese ReceiptsInternational Journal on Document Analysis and Recognition (IJDAR), 2025

281

26 Feb 2025

Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs

Gaye Colakoglu

Gürkan Solmaz

Jonathan Fürst

309

25 Feb 2025

OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models

335

22 Feb 2025

Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence

...

Tanveer Syeda-Mahmood

454

14 Feb 2025

Handwritten Text Recognition: A Survey

Carlos Garrido-Munoz

Antonio Ríos-Vila

Jorge Calvo-Zaragoza

315

12 Feb 2025

DocVLM: Make Your VLM an Efficient ReaderComputer Vision and Pattern Recognition (CVPR), 2024

632

11 Dec 2024

Hierarchical Visual Feature Aggregation for OCR-Free Document UnderstandingNeural Information Processing Systems (NeurIPS), 2024

141

08 Nov 2024

ReLayout: Towards Real-World Document Understanding via Layout-enhanced Pre-trainingInternational Conference on Computational Linguistics (COLING), 2024

227

14 Oct 2024

Towards an Improved Metric for Evaluating Disentangled Representations

Sahib Julka

Yashu Wang

Michael Granitzer

194

04 Oct 2024

Modeling Layout Reading Order as Ordering Relations for Visually-rich Document UnderstandingConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Huan Chen

...

Qi Zhang

196

29 Sep 2024

See then Tell: Enhancing Key Information Extraction with Vision Grounding

243

29 Sep 2024

DocMamba: Efficient Document Pre-training with State Space ModelAAAI Conference on Artificial Intelligence (AAAI), 2024

Pengfei Hu

Jiefeng Ma

291

18 Sep 2024

READoc: A Unified Benchmark for Realistic Document Structured ExtractionAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

411

08 Sep 2024

ViRED: Prediction of Visual Relations in Engineering DrawingsInternational Conference on Mobile Ad-hoc and Sensor Networks (ICMASN), 2024

204

02 Sep 2024

μgat: Improving Single-Page Document Parsing by Providing Multi-Page Context

Fabio Quattrini

Carmine Zaccagnino

Silvia Cascianelli

Laura Righi

Rita Cucchiara

175

28 Aug 2024

DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document UnderstandingComputer Vision and Pattern Recognition (CVPR), 2024

Jun Huang

568

27 Aug 2024

Deep Learning based Visually Rich Document Content Understanding: A Survey

458

02 Aug 2024

VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding

346

17 Jul 2024