ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.10213
  4. Cited By
ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction

ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction

IEEE International Conference on Document Analysis and Recognition (ICDAR), 2019
18 March 2021
Zheng Huang
Kai Chen
Jianhua He
X. Bai
Dimosthenis Karatzas
Shijian Lu
C. V. Jawahar
ArXiv (abs)PDFHTML

Papers citing "ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction"

50 / 219 papers shown
DocVAL: Validated Chain-of-Thought Distillation for Grounded Document VQA
DocVAL: Validated Chain-of-Thought Distillation for Grounded Document VQA
Ahmad Mohammadshirazi
Pinaki Prasad Guha Neogi
Dheeraj Kulshrestha
R. Ramnath
VGen
138
0
0
27 Nov 2025
MGA-VQA: Secure and Interpretable Graph-Augmented Visual Question Answering with Memory-Guided Protection Against Unauthorized Knowledge Use
MGA-VQA: Secure and Interpretable Graph-Augmented Visual Question Answering with Memory-Guided Protection Against Unauthorized Knowledge Use
Ahmad Mohammadshirazi
Pinaki Prasad Guha Neogi
Dheeraj Kulshrestha
R. Ramnath
107
0
0
22 Nov 2025
ARIAL: An Agentic Framework for Document VQA with Precise Answer Localization
ARIAL: An Agentic Framework for Document VQA with Precise Answer Localization
Ahmad Mohammadshirazi
Pinaki Prasad Guha Neogi
Dheeraj Kulshrestha
R. Ramnath
121
0
0
22 Nov 2025
VisReason: A Large-Scale Dataset for Visual Chain-of-Thought Reasoning
VisReason: A Large-Scale Dataset for Visual Chain-of-Thought Reasoning
Lingxiao Li
Y. Wang
Xinyan Gao
Chen Tang
Xiangyu Yue
Chenyu You
LRM
80
1
0
21 Nov 2025
GranViT: A Fine-Grained Vision Model With Autoregressive Perception For MLLMs
GranViT: A Fine-Grained Vision Model With Autoregressive Perception For MLLMs
Guanghao Zheng
Bowen Shi
Mingxing Xu
Ruoyu Sun
Peisen Zhao
...
Wenrui Dai
Junni Zou
Hongkai Xiong
Xiaopeng Zhang
Qi Tian
VLM
161
0
0
23 Oct 2025
Unified Reinforcement and Imitation Learning for Vision-Language Models
Unified Reinforcement and Imitation Learning for Vision-Language Models
Byung-Kwan Lee
Ryo Hachiuma
Yong Man Ro
Yu-Chun Wang
Yueh-Hua Wu
VLM
160
2
0
22 Oct 2025
FineVision: Open Data Is All You Need
FineVision: Open Data Is All You Need
Luis Wiedmann
Orr Zohar
Amir Mahla
Xiaohan Wang
Rui Li
Thibaud Frere
Leandro von Werra
Aritra Roy Gosthipaty
Andrés Marafioti
VLM
196
13
0
20 Oct 2025
Seeing but Not Believing: Probing the Disconnect Between Visual Attention and Answer Correctness in VLMs
Seeing but Not Believing: Probing the Disconnect Between Visual Attention and Answer Correctness in VLMs
Zhining Liu
Ziyi Chen
Hui Liu
Chen Luo
Xianfeng Tang
...
Zhenwei Dai
Zhan Shi
Tianxin Wei
Benoit Dumoulin
Hanghang Tong
LRM
136
3
0
20 Oct 2025
Document Intelligence in the Era of Large Language Models: A Survey
Document Intelligence in the Era of Large Language Models: A Survey
Weishi Wang
Hengchang Hu
Zhijie Zhang
Zhaochen Li
Hongxin Shao
Daniel Dahlmeier
AI4TS
193
1
0
15 Oct 2025
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
Wenwen Tong
Hewei Guo
Dongchuan Ran
Jiangnan Chen
Jiefan Lu
...
Dinghao Zhou
Guiping Zhong
Ken Zheng
Shiyin Kang
Lewei Lu
MLLMAuLLMVGenVLM
432
4
0
15 Oct 2025
Exploring OCR-augmented Generation for Bilingual VQA
Exploring OCR-augmented Generation for Bilingual VQA
JoonHo Lee
Sunho Park
VLM
116
0
0
02 Oct 2025
FlexDoc: Parameterized Sampling for Diverse Multilingual Synthetic Documents for Training Document Understanding Models
FlexDoc: Parameterized Sampling for Diverse Multilingual Synthetic Documents for Training Document Understanding Models
Karan Dua
Hitesh Laxmichand Patel
Puneet Mittal
Ranjeet Gupta
Amit Agarwal
Praneet Pabolu
Srikant Panda
Hansa Meghwani
Graham Horwood
Fahad Shah
SyDa
138
1
0
02 Oct 2025
Visual CoT Makes VLMs Smarter but More Fragile
Visual CoT Makes VLMs Smarter but More Fragile
Chunxue Xu
Yiwei Wang
Yujun Cai
Bryan Hooi
Songze Li
MLLMLRM
147
0
0
28 Sep 2025
AgenticIE: An Adaptive Agent for Information Extraction from Complex Regulatory Documents
AgenticIE: An Adaptive Agent for Information Extraction from Complex Regulatory Documents
Gaye Colakoglu
Gürkan Solmaz
Jonathan Fürst
204
0
0
15 Sep 2025
MindVL: Towards Efficient and Effective Training of Multimodal Large Language Models on Ascend NPUs
MindVL: Towards Efficient and Effective Training of Multimodal Large Language Models on Ascend NPUs
Feilong Chen
Y. Liu
Yi Huang
Hao Wang
Miren Tian
Ya-Qi Yu
Minghui Liao
Jihao Wu
MLLMVLM
325
1
0
15 Sep 2025
DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding
DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding
Wenwen Yu
Zhibo Yang
Yuliang Liu
Xiang Bai
MLLMOffRLLRM
92
4
0
12 Aug 2025
$R^2$-CoD: Understanding Text-Graph Complementarity in Relational Reasoning via Knowledge Co-Distillation
R2R^2R2-CoD: Understanding Text-Graph Complementarity in Relational Reasoning via Knowledge Co-Distillation
Zhen Wu
Ritam Dutt
Luke M. Breitfeller
Armineh Nourbakhsh
Siddharth Parekh
Carolyn Rose
150
0
0
02 Aug 2025
MMAT-1M: A Large Reasoning Dataset for Multimodal Agent Tuning
MMAT-1M: A Large Reasoning Dataset for Multimodal Agent Tuning
Tianhong Gao
Yannian Fu
Weiqun Wu
Haixiao Yue
Shanshan Liu
Gang Zhang
MLLMLRM
275
1
0
29 Jul 2025
DocPolarBERT: A Pre-trained Model for Document Understanding with Relative Polar Coordinate Encoding of Layout Structures
DocPolarBERT: A Pre-trained Model for Document Understanding with Relative Polar Coordinate Encoding of Layout Structures
Benno Uthayasooriyar
Antoine Ly
Franck Vermet
Caio Corro
292
0
0
11 Jul 2025
Robustness Evaluation of OCR-based Visual Document Understanding under Multi-Modal Adversarial Attacks
Robustness Evaluation of OCR-based Visual Document Understanding under Multi-Modal Adversarial Attacks
Dong Nguyen Tien
Dung D. Le
AAML
229
0
0
19 Jun 2025
GenRecal: Generation after Recalibration from Large to Small Vision-Language Models
GenRecal: Generation after Recalibration from Large to Small Vision-Language Models
Byung-Kwan Lee
Ryo Hachiuma
Yong Man Ro
Yu-Chun Wang
Yueh-Hua Wu
VLM
327
3
0
18 Jun 2025
Hyper-Local Deformable Transformers for Text Spotting on Historical Maps
Hyper-Local Deformable Transformers for Text Spotting on Historical MapsKnowledge Discovery and Data Mining (KDD), 2024
Yijun Lin
Yao-Yi Chiang
150
7
0
17 Jun 2025
CoMemo: LVLMs Need Image Context with Image Memory
CoMemo: LVLMs Need Image Context with Image Memory
Shi-Qi Liu
Weijie Su
Xizhou Zhu
Wenhai Wang
Jifeng Dai
VLM
218
0
0
06 Jun 2025
VRD-IU: Lessons from Visually Rich Document Intelligence and Understanding
VRD-IU: Lessons from Visually Rich Document Intelligence and UnderstandingInternational Joint Conference on Artificial Intelligence (IJCAI), 2024
Yihao Ding
S. Han
Yan Li
Josiah Poon
184
3
0
02 Jun 2025
Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought
Argus: Vision-Centric Reasoning with Grounded Chain-of-ThoughtComputer Vision and Pattern Recognition (CVPR), 2025
Yunze Man
De-An Huang
Guilin Liu
Shiwei Sheng
Shilong Liu
Liang-Yan Gui
Jan Kautz
Yu Wang
Zhiding Yu
MLLMLRM
335
19
0
29 May 2025
FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities
FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities
Jin Wang
Yao Lai
Aoxue Li
Shifeng Zhang
Jiacheng Sun
Ning Kang
Chengyue Wu
Zhenguo Li
Ping Luo
396
19
0
26 May 2025
SATORI-R1: Incentivizing Multimodal Reasoning through Explicit Visual Anchoring
SATORI-R1: Incentivizing Multimodal Reasoning through Explicit Visual Anchoring
Chuming Shen
Wei Wei
Xiaoye Qu
Yu Cheng
LRM
452
8
0
25 May 2025
ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding
ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding
Muye Huang
Lingling Zhang
Jie Ma
Han Lai
Fangzhi Xu
Yifei Li
Wenjun Wu
Yaqiang Wu
Jun Liu
LRM
281
5
0
25 May 2025
Doc-CoB: Enhancing Multi-Modal Document Understanding with Visual Chain-of-Boxes Reasoning
Doc-CoB: Enhancing Multi-Modal Document Understanding with Visual Chain-of-Boxes Reasoning
Ye Mo
Zirui Shao
Kai Ye
Xianwei Mao
Bo Zhang
...
Gang Huang
Kehan Chen
Zhou Huan
Zixu Yan
Sheng Zhou
LRM
298
3
0
24 May 2025
FS-DAG: Few Shot Domain Adapting Graph Networks for Visually Rich Document Understanding
FS-DAG: Few Shot Domain Adapting Graph Networks for Visually Rich Document UnderstandingInternational Conference on Computational Linguistics (COLING), 2025
Amit Agarwal
Srikant Panda
Kulbhushan Pachauri
211
12
0
22 May 2025
Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning
Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning
Jinghui Lu
Haiyang Yu
Siliang Xu
Shiwei Ran
Guozhi Tang
...
Teng Fu
Hao Feng
Jingqun Tang
Hongru Wang
Can Huang
LRM
399
16
0
21 May 2025
Information Extraction from Visually Rich Documents using LLM-based Organization of Documents into Independent Textual Segments
Information Extraction from Visually Rich Documents using LLM-based Organization of Documents into Independent Textual SegmentsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Aniket Bhattacharyya
Anurag Tripathi
Ujjal Das
Archan Karmakar
Amit Pathak
Maneesh Gupta
190
2
0
18 May 2025
Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization
Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization
Kesen Zhao
B. Zhu
Qianru Sun
Hanwang Zhang
MLLMLRM
435
14
0
25 Apr 2025
Relation-Rich Visual Document Generator for Visual Information Extraction
Relation-Rich Visual Document Generator for Visual Information ExtractionComputer Vision and Pattern Recognition (CVPR), 2025
Zi-Han Jiang
Chien-Wei Lin
Wei-Hua Li
Hsuan-Tung Liu
Yi-Ren Yeh
Chu-Song Chen
272
1
0
14 Apr 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu
Weiyun Wang
Zhe Chen
Ziwei Liu
Shenglong Ye
...
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
Wei Wang
MLLMVLM
625
806
1
14 Apr 2025
Capybara-OMNI: An Efficient Paradigm for Building Omni-Modal Language Models
Capybara-OMNI: An Efficient Paradigm for Building Omni-Modal Language Models
Xingguang Ji
Jiakang Wang
Hongzhi Zhang
Jingyuan Zhang
Haonan Zhou
Chenxi Sun
Wenshu Fan
Qi Wang
Fuzheng Zhang
MLLMVLM
302
1
0
10 Apr 2025
Data Metabolism: An Efficient Data Design Schema For Vision Language Model
Data Metabolism: An Efficient Data Design Schema For Vision Language Model
Jingyuan Zhang
Hongzhi Zhang
Zhou Haonan
Chenxi Sun
Xingguang Ji
Jiakang Wang
Fanheng Kong
Wenshu Fan
Qi Wang
Fuzheng Zhang
VLM
385
2
0
10 Apr 2025
VISTA-OCR: Towards generative and interactive end to end OCR models
VISTA-OCR: Towards generative and interactive end to end OCR models
Laziz Hamdi
Amine Tamasna
Pascal Boisson
Thierry Paquet
255
3
0
04 Apr 2025
Improving Applicability of Deep Learning based Token Classification models during Training
Improving Applicability of Deep Learning based Token Classification models during Training
Anket Mehra
Malte Prieß
Marian Himstedt
277
0
0
28 Mar 2025
BiblioPage: A Dataset of Scanned Title Pages for Bibliographic Metadata Extraction
BiblioPage: A Dataset of Scanned Title Pages for Bibliographic Metadata ExtractionIEEE International Conference on Document Analysis and Recognition (ICDAR), 2025
Jan Kohút
Martin Dočekal
Michal Hradiš
Marek Vaško
196
1
0
25 Mar 2025
SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding
SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding
Mingze Xu
Mingfei Gao
Shiyu Li
Jiasen Lu
Zhe Gan
Zhengfeng Lai
Meng Cao
Kai Kang
Yue Yang
Afshin Dehghan
429
15
0
24 Mar 2025
Joint Extraction Matters: Prompt-Based Visual Question Answering for Multi-Field Document Information Extraction
Joint Extraction Matters: Prompt-Based Visual Question Answering for Multi-Field Document Information Extraction
Mengsay Loem
Taiju Hosaka
212
0
0
21 Mar 2025
Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding
Marten: Visual Question Answering with Mask Generation for Multi-modal Document UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025
Zining Wang
Tongkun Guan
Pei Fu
Chen Duan
Qianyi Jiang
Zhentao Guo
Shan Guo
Junfeng Luo
Wei Shen
Yunbo Wang
MLLMVLM
247
7
0
18 Mar 2025
An Efficient Deep Learning-Based Approach to Automating Invoice Document Validation
An Efficient Deep Learning-Based Approach to Automating Invoice Document ValidationACS/IEEE International Conference on Computer Systems and Applications (AICCSA), 2024
Aziz Amari
Mariem Makni
Wissal Fnaich
Akram Lahmar
Fedi Koubaa
Oumayma Charrad
Mohamed Ali Zormati
Rabaa Youssef Douss
176
3
0
15 Mar 2025
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
Weiyun Wang
Zhangwei Gao
Lawrence Yunliang Chen
Zhe Chen
Jinguo Zhu
...
Lewei Lu
Haodong Duan
Yu Qiao
Jifeng Dai
Wenhai Wang
LRM
348
87
0
13 Mar 2025
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
Zhangquan Chen
Xufang Luo
Dongsheng Li
OffRLLRM
442
23
0
10 Mar 2025
Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document Transcription
Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document Transcription
Benjamin Gutteridge
Matthew Thomas Jackson
Toni Kukurin
Xiaowen Dong
144
0
0
27 Feb 2025
Towards Statistical Factuality Guarantee for Large Vision-Language Models
Towards Statistical Factuality Guarantee for Large Vision-Language Models
Hao Sun
Chao Yan
Nicholas J. Jackson
Wendi Cui
B. Li
Jiaxin Zhang
Sricharan Kumar
347
0
0
27 Feb 2025
LiGT: Layout-infused Generative Transformer for Visual Question Answering on Vietnamese Receipts
LiGT: Layout-infused Generative Transformer for Visual Question Answering on Vietnamese ReceiptsInternational Journal on Document Analysis and Recognition (IJDAR), 2025
Thanh-Phong Le
Trung Le Chi Phan
Nghia Hieu Nguyen
Kiet Van Nguyen
ViT
286
2
0
26 Feb 2025
olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models
olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models
Jake Poznanski
Crystal Nam
Jon Borchardt
Jason Dunkelberger
Regan Huff
Daniel Lin
Aman Rangapur
Christopher Wilhelm
Kyle Lo
Luca Soldaini
613
40
0
25 Feb 2025
12345
Next
Page 1 of 5