ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2011.13534
  4. Cited By
A Survey of Deep Learning Approaches for OCR and Document Understanding
v1v2 (latest)

A Survey of Deep Learning Approaches for OCR and Document Understanding

27 November 2020
Nishant Subramani
Alexandre Matton
Malcolm Greaves
Adrian Lam
ArXiv (abs)PDFHTML

Papers citing "A Survey of Deep Learning Approaches for OCR and Document Understanding"

24 / 24 papers shown
Robustness of Structured Data Extraction from Perspectively Distorted Documents
Robustness of Structured Data Extraction from Perspectively Distorted Documents
Hyakka Nakada
Yoshiyasu Tanaka
69
1
0
18 Nov 2025
Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding
Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding
Sensen Gao
Shanshan Zhao
Xu Jiang
Lunhao Duan
Yong Xien Chng
Qing-Guo Chen
Weihua Luo
Kaifu Zhang
Jia-Wang Bian
Mingming Gong
425
4
0
17 Oct 2025
DocReward: A Document Reward Model for Structuring and Stylizing
DocReward: A Document Reward Model for Structuring and Stylizing
Junpeng Liu
Yuzhong Zhao
Bowen Cao
Jiayu Ding
Yilin Jia
...
Sun Mao
FNU Kartik
Si-Qing Chen
W. Lam
Furu Wei
204
0
0
13 Oct 2025
Multi-Modal Vision vs. Text-Based Parsing: Benchmarking LLM Strategies for Invoice Processing
Multi-Modal Vision vs. Text-Based Parsing: Benchmarking LLM Strategies for Invoice Processing
David Berghaus
Armin Berger
L. Hillebrand
K. Cvejoski
R. Sifa
129
2
0
29 Aug 2025
Finding Needles in Images: Can Multimodal LLMs Locate Fine Details?
Finding Needles in Images: Can Multimodal LLMs Locate Fine Details?Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Parth Thakkar
Ankush Agarwal
Prasad Kasu
Pulkit Bansal
Chaitanya Devaguptapu
166
0
0
07 Aug 2025
OCRGenBench: A Comprehensive Benchmark for Evaluating OCR Generative Capabilities
OCRGenBench: A Comprehensive Benchmark for Evaluating OCR Generative Capabilities
Peirong Zhang
Haowei Xu
Jiaxin Zhang
Guitao Xu
Xuhan Zheng
Zhenhua Yang
Junle Liu
Yuyi Zhang
Lianwen Jin
Lianwen Jin
377
4
0
20 Jul 2025
Towards Visual Text Grounding of Multimodal Large Language Model
Towards Visual Text Grounding of Multimodal Large Language Model
Ming Li
Ruiyi Zhang
Jian Chen
Jiuxiang Gu
Jiuxiang Gu
Franck Dernoncourt
Wanrong Zhu
Wanrong Zhu
Tianyi Zhou
Tong Sun
500
14
0
07 Apr 2025
Multimodal Large Language Models for Text-rich Image Understanding: A Comprehensive Review
Multimodal Large Language Models for Text-rich Image Understanding: A Comprehensive ReviewAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Pei Fu
Tongkun Guan
Zining Wang
Zhentao Guo
Chen Duan
...
Boming Chen
Jiayao Ma
Qianyi Jiang
Kai Zhou
Junfeng Luo
VLM
451
5
0
23 Feb 2025
VORTEX: A Spatial Computing Framework for Optimized Drone Telemetry Extraction from First-Person View Flight Data
VORTEX: A Spatial Computing Framework for Optimized Drone Telemetry Extraction from First-Person View Flight Data
James E. Gallagher
E. Oughton
207
0
0
24 Dec 2024
PerSRV: Personalized Sticker Retrieval with Vision-Language Model
PerSRV: Personalized Sticker Retrieval with Vision-Language ModelThe Web Conference (WWW), 2024
Heng Er Metilda Chee
Jiayin Wang
Zhiqiang Guo
Weizhi Ma
Min Zhang
228
2
0
29 Oct 2024
Towards an Improved Metric for Evaluating Disentangled Representations
Towards an Improved Metric for Evaluating Disentangled Representations
Sahib Julka
Yashu Wang
Michael Granitzer
228
6
0
04 Oct 2024
Leveraging Distillation Techniques for Document Understanding: A Case
  Study with FLAN-T5
Leveraging Distillation Techniques for Document Understanding: A Case Study with FLAN-T5Jahrestagung der Gesellschaft für Informatik (GI Jahrestagung), 2024
Marcel Lamott
Muhammad Armaghan Shakir
231
5
0
17 Sep 2024
Image-to-LaTeX Converter for Mathematical Formulas and Text
Image-to-LaTeX Converter for Mathematical Formulas and Text
Daniil Gurgurov
Aleksey Morshnev
ViTVLM
263
4
0
07 Aug 2024
Deep Learning based Visually Rich Document Content Understanding: A Survey
Deep Learning based Visually Rich Document Content Understanding: A Survey
Muhammad Ali
Jean Lee
Salman Khan
Eduard Hovy
567
21
0
02 Aug 2024
MixTex: Unambiguous Recognition Should Not Rely Solely on Real Data
MixTex: Unambiguous Recognition Should Not Rely Solely on Real Data
Renqing Luo
Yuhan Xu
271
1
0
24 Jun 2024
Reconstructing training data from document understanding models
Reconstructing training data from document understanding models
Jérémie Dentan
Arnaud Paran
A. Shabou
AAMLSyDa
358
3
0
05 Jun 2024
Transformers and Language Models in Form Understanding: A Comprehensive
  Review of Scanned Document Analysis
Transformers and Language Models in Form Understanding: A Comprehensive Review of Scanned Document Analysis
Abdelrahman Abdallah
Daniel Eberharter
Zoe Pfister
Adam Jatowt
283
17
0
06 Mar 2024
Handwritten and Printed Text Segmentation: A Signature Case Study
Handwritten and Printed Text Segmentation: A Signature Case StudyIEEE International Conference on Computer Vision (ICCV), 2023
Sina Gholamian
Ali Vahdat
188
4
0
15 Jul 2023
TransDocAnalyser: A Framework for Offline Semi-structured Handwritten
  Document Analysis in the Legal Domain
TransDocAnalyser: A Framework for Offline Semi-structured Handwritten Document Analysis in the Legal Domain
Sagar Chakraborty
Gaurav Harit
Saptarshi Ghosh
216
3
0
03 Jun 2023
Literature Review: Computer Vision Applications in Transportation
  Logistics and Warehousing
Literature Review: Computer Vision Applications in Transportation Logistics and Warehousing
Alexander Naumann
Felix Hertlein
Laura Doerr
Steffen Thoma
K. Furmans
380
13
0
12 Apr 2023
Cleansing Jewel: A Neural Spelling Correction Model Built On Google
  OCR-ed Tibetan Manuscripts
Cleansing Jewel: A Neural Spelling Correction Model Built On Google OCR-ed Tibetan Manuscripts
Queenie Luo
Yung-Sung Chuang
251
3
0
07 Apr 2023
Towards Complex Document Understanding By Discrete Reasoning
Towards Complex Document Understanding By Discrete ReasoningACM Multimedia (ACM MM), 2022
Fengbin Zhu
Wenqiang Lei
Fuli Feng
Chao Wang
Haozhou Zhang
Tat-Seng Chua
469
94
0
25 Jul 2022
Detection Masking for Improved OCR on Noisy Documents
Detection Masking for Improved OCR on Noisy Documents
Daniel Rotman
Ophir Azulai
Inbar Shapira
Yevgeny Burshtein
Udi Barzelay
271
5
0
17 May 2022
DeepCPCFG: Deep Learning and Context Free Grammars for End-to-End
  Information Extraction
DeepCPCFG: Deep Learning and Context Free Grammars for End-to-End Information ExtractionIEEE International Conference on Document Analysis and Recognition (ICDAR), 2021
Freddy Chongtat Chua
Nigel P. Duffy
222
7
0
10 Mar 2021
1
Page 1 of 1