v1v2 (latest)

DocFormer: End-to-End Transformer for Document Understanding

IEEE International Conference on Computer Vision (ICCV), 2021

22 June 2021

Bhargava Urala Kota

Papers citing "DocFormer: End-to-End Transformer for Document Understanding"

50 / 205 papers shown

Title
ARIAL: An Agentic Framework for Document VQA with Precise Answer Localization Ahmad Mohammadshirazi Pinaki Prasad Guha Neogi Dheeraj Kulshrestha R. Ramnath 60 0 0 22 Nov 2025
TabRAG: Tabular Document Retrieval via Structured Language Representations Jacob Si Mike Qu Michelle Lee Yingzhen Li LMTD 3DGS 3DV 227 0 0 10 Nov 2025
ColMate: Contrastive Late Interaction and Masked Text for Multimodal Document Retrieval Ahmed Masry Megh Thakkar Patrice Bechard Sathwik Tejaswi Madhusudhan Rabiul Awal ... Srivatsava Daruru Enamul Hoque Spandana Gella Torsten Scholak Sai Rajeswar VLM 188 0 0 02 Nov 2025
Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding Sensen Gao Shanshan Zhao Xu Jiang Lunhao Duan Yong Xien Chng Qing-Guo Chen Weihua Luo Kaifu Zhang Jia-Wang Bian Mingming Gong 218 0 0 17 Oct 2025
Invoice Information Extraction: Methods and Performance Evaluation Sai Yashwant Anurag Dubey Praneeth Paikray Gantala Thulsiram 84 0 0 17 Oct 2025
Hybrid OCR-LLM Framework for Enterprise-Scale Document Information Extraction Under Copy-heavy Task Zilong Wang Xiaoyu Shen 48 0 0 11 Oct 2025
LLM/Agent-as-Data-Analyst: A Survey Zirui Tang Weizheng Wang Z. Zhou Yang Jiao Bangrui Xu ... Conghui He Bin Wang Conghui He Xiaoyang Wang Fan Wu 186 6 0 28 Sep 2025
OTCR: Optimal Transmission, Compression and Representation for Multimodal Information Extraction Y. Li Yajiao Wang Wenhao Hu Z. Zhang Mengting Zhang 48 0 0 17 Sep 2025
Vector embedding of multi-modal texts: a tool for discovery? Beth Plale Sai Navya Jyesta S. Withana 44 2 0 10 Sep 2025
Enhancing Document VQA Models via Retrieval-Augmented Generation Eric López Artemis LLabres Ernest Valveny RALM 172 1 0 26 Aug 2025
Seeing Like a Designer Without One: A Study on Unsupervised Slide Quality Assessment via Designer Cue Augmentation Tai Inui Steven Oh Magdeline Kuan 45 0 0 25 Aug 2025
Zero-shot Multimodal Document Retrieval via Cross-modal Question Generation Yejin Choi J. S. Park Janghan Yoon Saejin Kim Jaehyun Jeon Youngjae Yu 92 1 0 23 Aug 2025
From Surface to Semantics: Semantic Structure Parsing for Table-Centric Document Analysis Xuan Li Jialiang Dong Raymond Wong LMTD 99 0 0 14 Aug 2025
Zero-Shot Document Understanding using Pseudo Table of Contents-Guided Retrieval-Augmented Generation Hyeon Seong Jeong Sangwoo Jo Byeong Hyun Yoon Yoonseok Heo Haedong Jeong Taehoon Kim RALM VLM 116 0 0 31 Jul 2025
Describe Anything Model for Visual Question Answering on Text-rich Images Yen-Linh Vu Dinh-Thang Duong Truong-Binh Duong Anh-Khoi Nguyen Thanh-Huy Nguyen ... Jianhua Xing Xingjian Li Tianyang Wang Ulas Bagci Min Xu VLM 223 2 0 16 Jul 2025
DocPolarBERT: A Pre-trained Model for Document Understanding with Relative Polar Coordinate Encoding of Layout Structures Benno Uthayasooriyar Antoine Ly Franck Vermet Caio Corro 264 0 0 11 Jul 2025
From Drawings to Decisions: A Hybrid Vision-Language Framework for Parsing 2D Engineering Drawings into Structured Manufacturing Knowledge Muhammad Tayyab Khan Lequn Chen Zane Yong Jun Ming Tan Wenhe Feng Seung Ki Moon 124 0 0 20 Jun 2025
Robustness Evaluation of OCR-based Visual Document Understanding under Multi-Modal Adversarial Attacks Dong Nguyen Tien Dung D. Le AAML 186 0 0 19 Jun 2025
FormGym: Doing Paperwork with Agents Matthew Toles Rattandeep Singh Isaac Song Zhou Yu 89 0 0 17 Jun 2025
SimpleDoc: Multi-Modal Document Understanding with Dual-Cue Page Retrieval and Iterative Refinement Chelsi Jain Yiran Wu Yifan Zeng Jiale Liu S hengyu Dai Zhenwen Shao Qingyun Wu Huazheng Wang 167 6 0 16 Jun 2025
Multimodal Tabular Reasoning with Privileged Structured Information Jun-Peng Jiang Yu Xia Hai-Long Sun Shiyin Lu Qing-Guo Chen Weihua Luo Kaifu Zhang De-Chuan Zhan Han-Jia Ye LMTD LRM 237 4 0 04 Jun 2025
Information Extraction from Visually Rich Documents using LLM-based Organization of Documents into Independent Textual SegmentsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025 Aniket Bhattacharyya Anurag Tripathi Ujjal Das Archan Karmakar Amit Pathak Maneesh Gupta 156 2 0 18 May 2025
Lost in OCR Translation? Vision-Based Approaches to Robust Document RetrievalACM Symposium on Document Engineering (DocEng), 2025 Alexander Buschmann Most Joseph Winjum Ayan Biswas Shawn Jones Nishath Rajiv Ranasinghe Dan O’Malley Manish Bhattarai 173 1 0 08 May 2025
Representation Learning for Tabular Data: A Comprehensive Survey Jun-Peng Jiang Si-Yang Liu Hao-Run Cai Qile Zhou Han-Jia Ye LMTD 341 15 0 17 Apr 2025
NoTeS-Bank: Benchmarking Neural Transcription and Search for Scientific Notes Understanding Aniket Pal Sanket Biswas Alloy Das Ayush Lodh Priyanka Banerjee Soumitri Chattopadhyay Dimosthenis Karatzas Josep Lladós C. V. Jawahar VLM 166 0 0 12 Apr 2025
Preserving Privacy Without Compromising Accuracy: Machine Unlearning for Handwritten Text Recognition Lei Kang Xuanshuo Fu Lluís Gómez Alicia Fornés Ernest Valveny Dimosthenis Karatzas MU 301 0 0 11 Apr 2025
SmolVLM: Redefining small and efficient multimodal models Andres Marafioti Orr Zohar Miquel Farré Merve Noyan Elie Bakouch ... Hugo Larcher Mathieu Morlon Lewis Tunstall Leandro von Werra Thomas Wolf VLM 386 105 0 07 Apr 2025
QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-free Visual Document Understanding Binh M. Le Shaoyuan Xu Jinmiao Fu Zhishen Huang Moyan Li Yanhui Guo Hongdong Li Sameera Ramasinghe Bryan Wang 259 0 0 03 Apr 2025
BiblioPage: A Dataset of Scanned Title Pages for Bibliographic Metadata ExtractionIEEE International Conference on Document Analysis and Recognition (ICDAR), 2025 Jan Kohút Martin Dočekal Michal Hradiš Marek Vaško 167 0 0 25 Mar 2025
A Simple yet Effective Layout Token in Large Language Models for Document UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025 Zhaoqing Zhu Chuwei Luo Zirui Shao Feiyu Gao Hangdi Xing Qi Zheng Ji Zhang 250 6 0 24 Mar 2025
TextBite: A Historical Czech Document Dataset for Logical Page Segmentation Martin Kostelník Karel Beneš Michal Hradiš 156 0 0 20 Mar 2025
KIEval: Evaluation Metric for Document Key Information ExtractionIEEE International Conference on Document Analysis and Recognition (ICDAR), 2025 Minsoo Khang Sang Chul Jung Sungrae Park Teakgyu Hong 333 1 0 07 Mar 2025
LiGT: Layout-infused Generative Transformer for Visual Question Answering on Vietnamese ReceiptsInternational Journal on Document Analysis and Recognition (IJDAR), 2025 Thanh-Phong Le Trung Le Chi Phan Nghia Hieu Nguyen Kiet Van Nguyen ViT 257 2 0 26 Feb 2025
Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs Gaye Colakoglu Gürkan Solmaz Jonathan Fürst 273 4 0 25 Feb 2025
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models Wenwen Yu Zhibo Yang Jianqiang Wan Sibo Song J. Tang Wenqing Cheng Yunxing Liu Xiang Bai 319 13 0 22 Feb 2025
Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence Granite Vision Team Leonid Karlinsky Assaf Arbelle Abraham Daniels A. Nassar ... Sriram Raghavan Tanveer Syeda-Mahmood Peter W. J. Staar Tal Drory Rogerio Feris VLM AI4TS 398 13 0 14 Feb 2025
Handwritten Text Recognition: A Survey Carlos Garrido-Munoz Antonio Ríos-Vila Jorge Calvo-Zaragoza 287 5 0 12 Feb 2025
DocVLM: Make Your VLM an Efficient ReaderComputer Vision and Pattern Recognition (CVPR), 2024 Mor Shpigel Nacson Aviad Aberdam Roy Ganz Elad Ben Avraham Alona Golts Yair Kittenplon Shai Mazor Ron Litman VLM 569 0 0 11 Dec 2024
Hierarchical Visual Feature Aggregation for OCR-Free Document UnderstandingNeural Information Processing Systems (NeurIPS), 2024 Jaeyoo Park Jin Young Choi Jeonghyung Park Bohyung Han VLM 123 7 0 08 Nov 2024
ReLayout: Towards Real-World Document Understanding via Layout-enhanced Pre-trainingInternational Conference on Computational Linguistics (COLING), 2024 Zhouqiang Jiang Bowen Wang Junhao Chen Yuta Nakashima 225 5 0 14 Oct 2024
Towards an Improved Metric for Evaluating Disentangled Representations Sahib Julka Yashu Wang Michael Granitzer 162 4 0 04 Oct 2024
Modeling Layout Reading Order as Ordering Relations for Visually-rich Document UnderstandingConference on Empirical Methods in Natural Language Processing (EMNLP), 2024 Chong Zhang Yi Tu Yixi Zhao Chenshu Yuan Huan Chen ... Mingxu Chai Ya Guo Huijia Zhu Qi Zhang Tao Gui 172 9 0 29 Sep 2024
See then Tell: Enhancing Key Information Extraction with Vision Grounding Shuhang Liu Zhenrong Zhang Pengfei Hu Jiefeng Ma Jun Du Qing Wang Jianshu Zhang Chenyu Liu 227 1 0 29 Sep 2024
DocMamba: Efficient Document Pre-training with State Space ModelAAAI Conference on Artificial Intelligence (AAAI), 2024 Pengfei Hu Zhenrong Zhang Jiefeng Ma Shuhang Liu Jun Du Jianshu Zhang Mamba 262 1 0 18 Sep 2024
READoc: A Unified Benchmark for Realistic Document Structured ExtractionAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 Zichao Li Aizier Abulaiti Yaojie Lu Xuanang Chen Jia Zheng Hongyu Lin Xianpei Han Le Sun 359 6 0 08 Sep 2024
ViRED: Prediction of Visual Relations in Engineering DrawingsInternational Conference on Mobile Ad-hoc and Sensor Networks (ICMASN), 2024 Chao Gu Ke Lin Yiyang Luo Jiahui Hou Xiang-Yang Li 176 1 0 02 Sep 2024
μgat: Improving Single-Page Document Parsing by Providing Multi-Page Context Fabio Quattrini Carmine Zaccagnino Silvia Cascianelli Laura Righi Rita Cucchiara 155 3 0 28 Aug 2024
DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document UnderstandingComputer Vision and Pattern Recognition (CVPR), 2024 Wenhui Liao Jiapeng Wang Hongliang Li Chengyu Wang Jun Huang Lianwen Jin 471 0 0 27 Aug 2024
Deep Learning based Visually Rich Document Content Understanding: A Survey Muhammad Ali Jean Lee Salman Khan Eduard Hovy 426 14 0 02 Aug 2024
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding Ofir Abramovich Niv Nayman Sharon Fogel I. Lavi Ron Litman Shahar Tsiper Royee Tichauer Srikar Appalaraju Shai Mazor R. Manmatha VLM 310 6 0 17 Jul 2024

All Papers

DocFormer: End-to-End Transformer for Document Understanding

Papers citing "DocFormer: End-to-End Transformer for Document Understanding"