Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2106.11539
Cited By
v1
v2 (latest)
DocFormer: End-to-End Transformer for Document Understanding
IEEE International Conference on Computer Vision (ICCV), 2021
22 June 2021
Srikar Appalaraju
Bhavan A. Jasani
Bhargava Urala Kota
Yusheng Xie
R. Manmatha
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"DocFormer: End-to-End Transformer for Document Understanding"
50 / 205 papers shown
Title
ARIAL: An Agentic Framework for Document VQA with Precise Answer Localization
Ahmad Mohammadshirazi
Pinaki Prasad Guha Neogi
Dheeraj Kulshrestha
R. Ramnath
60
0
0
22 Nov 2025
TabRAG: Tabular Document Retrieval via Structured Language Representations
Jacob Si
Mike Qu
Michelle Lee
Yingzhen Li
LMTD
3DGS
3DV
227
0
0
10 Nov 2025
ColMate: Contrastive Late Interaction and Masked Text for Multimodal Document Retrieval
Ahmed Masry
Megh Thakkar
Patrice Bechard
Sathwik Tejaswi Madhusudhan
Rabiul Awal
...
Srivatsava Daruru
Enamul Hoque
Spandana Gella
Torsten Scholak
Sai Rajeswar
VLM
188
0
0
02 Nov 2025
Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding
Sensen Gao
Shanshan Zhao
Xu Jiang
Lunhao Duan
Yong Xien Chng
Qing-Guo Chen
Weihua Luo
Kaifu Zhang
Jia-Wang Bian
Mingming Gong
218
0
0
17 Oct 2025
Invoice Information Extraction: Methods and Performance Evaluation
Sai Yashwant
Anurag Dubey
Praneeth Paikray
Gantala Thulsiram
84
0
0
17 Oct 2025
Hybrid OCR-LLM Framework for Enterprise-Scale Document Information Extraction Under Copy-heavy Task
Zilong Wang
Xiaoyu Shen
48
0
0
11 Oct 2025
LLM/Agent-as-Data-Analyst: A Survey
Zirui Tang
Weizheng Wang
Z. Zhou
Yang Jiao
Bangrui Xu
...
Conghui He
Bin Wang
Conghui He
Xiaoyang Wang
Fan Wu
186
6
0
28 Sep 2025
OTCR: Optimal Transmission, Compression and Representation for Multimodal Information Extraction
Y. Li
Yajiao Wang
Wenhao Hu
Z. Zhang
Mengting Zhang
48
0
0
17 Sep 2025
Vector embedding of multi-modal texts: a tool for discovery?
Beth Plale
Sai Navya Jyesta
S. Withana
44
2
0
10 Sep 2025
Enhancing Document VQA Models via Retrieval-Augmented Generation
Eric López
Artemis LLabres
Ernest Valveny
RALM
172
1
0
26 Aug 2025
Seeing Like a Designer Without One: A Study on Unsupervised Slide Quality Assessment via Designer Cue Augmentation
Tai Inui
Steven Oh
Magdeline Kuan
45
0
0
25 Aug 2025
Zero-shot Multimodal Document Retrieval via Cross-modal Question Generation
Yejin Choi
J. S. Park
Janghan Yoon
Saejin Kim
Jaehyun Jeon
Youngjae Yu
92
1
0
23 Aug 2025
From Surface to Semantics: Semantic Structure Parsing for Table-Centric Document Analysis
Xuan Li
Jialiang Dong
Raymond Wong
LMTD
99
0
0
14 Aug 2025
Zero-Shot Document Understanding using Pseudo Table of Contents-Guided Retrieval-Augmented Generation
Hyeon Seong Jeong
Sangwoo Jo
Byeong Hyun Yoon
Yoonseok Heo
Haedong Jeong
Taehoon Kim
RALM
VLM
116
0
0
31 Jul 2025
Describe Anything Model for Visual Question Answering on Text-rich Images
Yen-Linh Vu
Dinh-Thang Duong
Truong-Binh Duong
Anh-Khoi Nguyen
Thanh-Huy Nguyen
...
Jianhua Xing
Xingjian Li
Tianyang Wang
Ulas Bagci
Min Xu
VLM
223
2
0
16 Jul 2025
DocPolarBERT: A Pre-trained Model for Document Understanding with Relative Polar Coordinate Encoding of Layout Structures
Benno Uthayasooriyar
Antoine Ly
Franck Vermet
Caio Corro
264
0
0
11 Jul 2025
From Drawings to Decisions: A Hybrid Vision-Language Framework for Parsing 2D Engineering Drawings into Structured Manufacturing Knowledge
Muhammad Tayyab Khan
Lequn Chen
Zane Yong
Jun Ming Tan
Wenhe Feng
Seung Ki Moon
124
0
0
20 Jun 2025
Robustness Evaluation of OCR-based Visual Document Understanding under Multi-Modal Adversarial Attacks
Dong Nguyen Tien
Dung D. Le
AAML
186
0
0
19 Jun 2025
FormGym: Doing Paperwork with Agents
Matthew Toles
Rattandeep Singh
Isaac Song Zhou Yu
89
0
0
17 Jun 2025
SimpleDoc: Multi-Modal Document Understanding with Dual-Cue Page Retrieval and Iterative Refinement
Chelsi Jain
Yiran Wu
Yifan Zeng
Jiale Liu
S hengyu Dai
Zhenwen Shao
Qingyun Wu
Huazheng Wang
167
6
0
16 Jun 2025
Multimodal Tabular Reasoning with Privileged Structured Information
Jun-Peng Jiang
Yu Xia
Hai-Long Sun
Shiyin Lu
Qing-Guo Chen
Weihua Luo
Kaifu Zhang
De-Chuan Zhan
Han-Jia Ye
LMTD
LRM
237
4
0
04 Jun 2025
Information Extraction from Visually Rich Documents using LLM-based Organization of Documents into Independent Textual Segments
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Aniket Bhattacharyya
Anurag Tripathi
Ujjal Das
Archan Karmakar
Amit Pathak
Maneesh Gupta
156
2
0
18 May 2025
Lost in OCR Translation? Vision-Based Approaches to Robust Document Retrieval
ACM Symposium on Document Engineering (DocEng), 2025
Alexander Buschmann Most
Joseph Winjum
Ayan Biswas
Shawn Jones
Nishath Rajiv Ranasinghe
Dan O’Malley
Manish Bhattarai
173
1
0
08 May 2025
Representation Learning for Tabular Data: A Comprehensive Survey
Jun-Peng Jiang
Si-Yang Liu
Hao-Run Cai
Qile Zhou
Han-Jia Ye
LMTD
341
15
0
17 Apr 2025
NoTeS-Bank: Benchmarking Neural Transcription and Search for Scientific Notes Understanding
Aniket Pal
Sanket Biswas
Alloy Das
Ayush Lodh
Priyanka Banerjee
Soumitri Chattopadhyay
Dimosthenis Karatzas
Josep Lladós
C. V. Jawahar
VLM
166
0
0
12 Apr 2025
Preserving Privacy Without Compromising Accuracy: Machine Unlearning for Handwritten Text Recognition
Lei Kang
Xuanshuo Fu
Lluís Gómez
Alicia Fornés
Ernest Valveny
Dimosthenis Karatzas
MU
301
0
0
11 Apr 2025
SmolVLM: Redefining small and efficient multimodal models
Andres Marafioti
Orr Zohar
Miquel Farré
Merve Noyan
Elie Bakouch
...
Hugo Larcher
Mathieu Morlon
Lewis Tunstall
Leandro von Werra
Thomas Wolf
VLM
386
105
0
07 Apr 2025
QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-free Visual Document Understanding
Binh M. Le
Shaoyuan Xu
Jinmiao Fu
Zhishen Huang
Moyan Li
Yanhui Guo
Hongdong Li
Sameera Ramasinghe
Bryan Wang
259
0
0
03 Apr 2025
BiblioPage: A Dataset of Scanned Title Pages for Bibliographic Metadata Extraction
IEEE International Conference on Document Analysis and Recognition (ICDAR), 2025
Jan Kohút
Martin Dočekal
Michal Hradiš
Marek Vaško
167
0
0
25 Mar 2025
A Simple yet Effective Layout Token in Large Language Models for Document Understanding
Computer Vision and Pattern Recognition (CVPR), 2025
Zhaoqing Zhu
Chuwei Luo
Zirui Shao
Feiyu Gao
Hangdi Xing
Qi Zheng
Ji Zhang
250
6
0
24 Mar 2025
TextBite: A Historical Czech Document Dataset for Logical Page Segmentation
Martin Kostelník
Karel Beneš
Michal Hradiš
156
0
0
20 Mar 2025
KIEval: Evaluation Metric for Document Key Information Extraction
IEEE International Conference on Document Analysis and Recognition (ICDAR), 2025
Minsoo Khang
Sang Chul Jung
Sungrae Park
Teakgyu Hong
333
1
0
07 Mar 2025
LiGT: Layout-infused Generative Transformer for Visual Question Answering on Vietnamese Receipts
International Journal on Document Analysis and Recognition (IJDAR), 2025
Thanh-Phong Le
Trung Le Chi Phan
Nghia Hieu Nguyen
Kiet Van Nguyen
ViT
257
2
0
26 Feb 2025
Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs
Gaye Colakoglu
Gürkan Solmaz
Jonathan Fürst
273
4
0
25 Feb 2025
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models
Wenwen Yu
Zhibo Yang
Jianqiang Wan
Sibo Song
J. Tang
Wenqing Cheng
Yunxing Liu
Xiang Bai
319
13
0
22 Feb 2025
Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence
Granite Vision Team
Leonid Karlinsky
Assaf Arbelle
Abraham Daniels
A. Nassar
...
Sriram Raghavan
Tanveer Syeda-Mahmood
Peter W. J. Staar
Tal Drory
Rogerio Feris
VLM
AI4TS
398
13
0
14 Feb 2025
Handwritten Text Recognition: A Survey
Carlos Garrido-Munoz
Antonio Ríos-Vila
Jorge Calvo-Zaragoza
287
5
0
12 Feb 2025
DocVLM: Make Your VLM an Efficient Reader
Computer Vision and Pattern Recognition (CVPR), 2024
Mor Shpigel Nacson
Aviad Aberdam
Roy Ganz
Elad Ben Avraham
Alona Golts
Yair Kittenplon
Shai Mazor
Ron Litman
VLM
569
0
0
11 Dec 2024
Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding
Neural Information Processing Systems (NeurIPS), 2024
Jaeyoo Park
Jin Young Choi
Jeonghyung Park
Bohyung Han
VLM
123
7
0
08 Nov 2024
ReLayout: Towards Real-World Document Understanding via Layout-enhanced Pre-training
International Conference on Computational Linguistics (COLING), 2024
Zhouqiang Jiang
Bowen Wang
Junhao Chen
Yuta Nakashima
225
5
0
14 Oct 2024
Towards an Improved Metric for Evaluating Disentangled Representations
Sahib Julka
Yashu Wang
Michael Granitzer
162
4
0
04 Oct 2024
Modeling Layout Reading Order as Ordering Relations for Visually-rich Document Understanding
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Chong Zhang
Yi Tu
Yixi Zhao
Chenshu Yuan
Huan Chen
...
Mingxu Chai
Ya Guo
Huijia Zhu
Qi Zhang
Tao Gui
172
9
0
29 Sep 2024
See then Tell: Enhancing Key Information Extraction with Vision Grounding
Shuhang Liu
Zhenrong Zhang
Pengfei Hu
Jiefeng Ma
Jun Du
Qing Wang
Jianshu Zhang
Chenyu Liu
227
1
0
29 Sep 2024
DocMamba: Efficient Document Pre-training with State Space Model
AAAI Conference on Artificial Intelligence (AAAI), 2024
Pengfei Hu
Zhenrong Zhang
Jiefeng Ma
Shuhang Liu
Jun Du
Jianshu Zhang
Mamba
262
1
0
18 Sep 2024
READoc: A Unified Benchmark for Realistic Document Structured Extraction
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Zichao Li
Aizier Abulaiti
Yaojie Lu
Xuanang Chen
Jia Zheng
Hongyu Lin
Xianpei Han
Le Sun
359
6
0
08 Sep 2024
ViRED: Prediction of Visual Relations in Engineering Drawings
International Conference on Mobile Ad-hoc and Sensor Networks (ICMASN), 2024
Chao Gu
Ke Lin
Yiyang Luo
Jiahui Hou
Xiang-Yang Li
176
1
0
02 Sep 2024
μgat: Improving Single-Page Document Parsing by Providing Multi-Page Context
Fabio Quattrini
Carmine Zaccagnino
Silvia Cascianelli
Laura Righi
Rita Cucchiara
155
3
0
28 Aug 2024
DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding
Computer Vision and Pattern Recognition (CVPR), 2024
Wenhui Liao
Jiapeng Wang
Hongliang Li
Chengyu Wang
Jun Huang
Lianwen Jin
471
0
0
27 Aug 2024
Deep Learning based Visually Rich Document Content Understanding: A Survey
Muhammad Ali
Jean Lee
Salman Khan
Eduard Hovy
426
14
0
02 Aug 2024
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
Ofir Abramovich
Niv Nayman
Sharon Fogel
I. Lavi
Ron Litman
Shahar Tsiper
Royee Tichauer
Srikar Appalaraju
Shai Mazor
R. Manmatha
VLM
310
6
0
17 Jul 2024
1
2
3
4
5
Next