ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.08455
  4. Cited By
Document Understanding Dataset and Evaluation (DUDE)
v1v2v3 (latest)

Document Understanding Dataset and Evaluation (DUDE)

IEEE International Conference on Computer Vision (ICCV), 2023
15 May 2023
Jordy Van Landeghem
Rubèn Pérez Tito
Łukasz Borchmann
Michal Pietruszka
Pawel Józiak
Rafal Powalski
Dawid Jurkiewicz
Mickael Coustaty
Bertrand Ackaert
Ernest Valveny
Matthew Blaschko
Sien Moens
Tomasz Stanislawek
    VGen
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)

Papers citing "Document Understanding Dataset and Evaluation (DUDE)"

50 / 74 papers shown
Title
Resolving Evidence Sparsity: Agentic Context Engineering for Long-Document Understanding
Resolving Evidence Sparsity: Agentic Context Engineering for Long-Document Understanding
Keliang Liu
Zizhi Chen
Mingcheng Li
Jingqun Tang
Dingkang Yang
Lihua Zhang
RALM
76
0
0
28 Nov 2025
Arctic-Extract Technical Report
Mateusz Chiliński
Julita Ołtusek
Wojciech Ja'skowski
VLM
116
0
0
20 Nov 2025
BBox DocVQA: A Large Scale Bounding Box Grounded Dataset for Enhancing Reasoning in Document Visual Question Answer
BBox DocVQA: A Large Scale Bounding Box Grounded Dataset for Enhancing Reasoning in Document Visual Question Answer
Wenhan Yu
Wang Chen
Guanqiang Qi
Weikang Li
Yang Li
Lei Sha
Deguo Xia
Jizhou Huang
85
1
0
19 Nov 2025
SCoPE VLM: Selective Context Processing for Efficient Document Navigation in Vision-Language Models
SCoPE VLM: Selective Context Processing for Efficient Document Navigation in Vision-Language Models
Gyubeum Lim
Yemo Koo
Vijay Krishna Madisetti
88
0
0
22 Oct 2025
Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding
Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding
Sensen Gao
Shanshan Zhao
Xu Jiang
Lunhao Duan
Yong Xien Chng
Qing-Guo Chen
Weihua Luo
Kaifu Zhang
Jia-Wang Bian
Mingming Gong
210
0
0
17 Oct 2025
Document Intelligence in the Era of Large Language Models: A Survey
Document Intelligence in the Era of Large Language Models: A Survey
Weishi Wang
Hengchang Hu
Zhijie Zhang
Zhaochen Li
Hongxin Shao
Daniel Dahlmeier
AI4TS
164
0
0
15 Oct 2025
Towards Reliable and Interpretable Document Question Answering via VLMs
Towards Reliable and Interpretable Document Question Answering via VLMs
Alessio Chen
Simone Giovannini
Andrea Gemelli
Fabio Coppini
S. Marinai
127
0
0
12 Sep 2025
Enhancing Document VQA Models via Retrieval-Augmented Generation
Enhancing Document VQA Models via Retrieval-Augmented Generation
Eric López
Artemis LLabres
Ernest Valveny
RALM
172
1
0
26 Aug 2025
DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding
DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding
Wenwen Yu
Zhibo Yang
Yuliang Liu
Xiang Bai
MLLMOffRLLRM
72
4
0
12 Aug 2025
DocR1: Evidence Page-Guided GRPO for Multi-Page Document Understanding
DocR1: Evidence Page-Guided GRPO for Multi-Page Document Understanding
Junyu Xiong
Yonghui Wang
Weichao Zhao
Chenyu Liu
Bing Yin
Wengang Zhou
Houqiang Li
LRM
155
4
0
10 Aug 2025
VisR-Bench: An Empirical Study on Visual Retrieval-Augmented Generation for Multilingual Long Document Understanding
VisR-Bench: An Empirical Study on Visual Retrieval-Augmented Generation for Multilingual Long Document Understanding
Jian Chen
Ming Li
Jihyung Kil
Chenguang Wang
Tong Yu
Ryan Rossi
Tianyi Zhou
Changyou Chen
Ruiyi Zhang
RALM
130
4
0
10 Aug 2025
Finding Needles in Images: Can Multimodal LLMs Locate Fine Details?
Finding Needles in Images: Can Multimodal LLMs Locate Fine Details?Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Parth Thakkar
Ankush Agarwal
Prasad Kasu
Pulkit Bansal
Chaitanya Devaguptapu
68
0
0
07 Aug 2025
Visual Document Understanding and Reasoning: A Multi-Agent Collaboration Framework with Agent-Wise Adaptive Test-Time Scaling
Visual Document Understanding and Reasoning: A Multi-Agent Collaboration Framework with Agent-Wise Adaptive Test-Time Scaling
Xinlei Yu
Z. Chen
Yudong Zhang
Shilin Lu
Ruolin Shen
J. Zhang
Xiaobin Hu
Yanwei Fu
Shuicheng Yan
166
13
0
05 Aug 2025
MMAT-1M: A Large Reasoning Dataset for Multimodal Agent Tuning
MMAT-1M: A Large Reasoning Dataset for Multimodal Agent Tuning
Tianhong Gao
Yannian Fu
Weiqun Wu
Haixiao Yue
Shanshan Liu
Gang Zhang
MLLMLRM
221
1
0
29 Jul 2025
Multi-Agent Interactive Question Generation Framework for Long Document Understanding
Multi-Agent Interactive Question Generation Framework for Long Document Understanding
Kesen Wang
Daulet Toibazar
Abdulrahman Alfulayt
Abdulaziz S. Albadawi
Ranya A. Alkahtani
Asma A. Ibrahim
Haneen A. Alhomoud
Sherif Mohamed
Pedro J. Moreno
133
3
0
27 Jul 2025
MMESGBench: Pioneering Multimodal Understanding and Complex Reasoning Benchmark for ESG Tasks
MMESGBench: Pioneering Multimodal Understanding and Complex Reasoning Benchmark for ESG Tasks
Lei Zhang
Xin Zhou
Chaoyue He
Haiyan Zhao
Y. Wu
Hong Xu
Wei Liu
Chunyan Miao
167
1
0
25 Jul 2025
Docopilot: Improving Multimodal Models for Document-Level Understanding
Docopilot: Improving Multimodal Models for Document-Level UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025
Yuchen Duan
Zhe Chen
Yusong Hu
Weiyun Wang
Shenglong Ye
...
Qibin Hou
Tong Lu
Jiaming Song
Jifeng Dai
Wenhai Wang
128
9
0
19 Jul 2025
Document Haystack: A Long Context Multimodal Image/Document Understanding Vision LLM Benchmark
Document Haystack: A Long Context Multimodal Image/Document Understanding Vision LLM Benchmark
Goeric Huybrechts
S. Ronanki
Sai Muralidhar Jayanthi
Jack FitzGerald
Srinivasan Veeravanallur
VLM
157
0
0
18 Jul 2025
WikiMixQA: A Multimodal Benchmark for Question Answering over Tables and Charts
WikiMixQA: A Multimodal Benchmark for Question Answering over Tables and ChartsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Negar Foroutan
Angelika Romanou
Matin Ansaripour
Julian Martin Eisenschlos
Karl Aberer
R. Lebret
226
2
0
18 Jun 2025
MM-R5: MultiModal Reasoning-Enhanced ReRanker via Reinforcement Learning for Document Retrieval
MM-R5: MultiModal Reasoning-Enhanced ReRanker via Reinforcement Learning for Document Retrieval
Mingjun Xu
Jinhan Dong
Jue Hou
Zehui Wang
Cunchun Li
Zhifeng Gao
Renxin Zhong
Hengxing Cai
AI4TSLRM
236
5
0
14 Jun 2025
Argus Inspection: Do Multimodal Large Language Models Possess the Eye of Panoptes?
Argus Inspection: Do Multimodal Large Language Models Possess the Eye of Panoptes?
Yang Yao
Lingyu Li
Jiaxin Song
Chiyu Chen
Zhenqi He
...
Xin Wang
Tianle Gu
Jie Li
Yan Teng
Yingchun Wang
LRM
254
0
0
03 Jun 2025
Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought
Argus: Vision-Centric Reasoning with Grounded Chain-of-ThoughtComputer Vision and Pattern Recognition (CVPR), 2025
Yunze Man
De-An Huang
Guilin Liu
Shiwei Sheng
Shilong Liu
Liang-Yan Gui
Jan Kautz
Yu Wang
Zhiding Yu
MLLMLRM
281
15
0
29 May 2025
ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding
ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding
Muye Huang
Lingling Zhang
Jie Ma
Han Lai
Fangzhi Xu
Yifei Li
Wenjun Wu
Yaqiang Wu
Jun Liu
LRM
198
2
0
25 May 2025
SATORI-R1: Incentivizing Multimodal Reasoning through Explicit Visual Anchoring
SATORI-R1: Incentivizing Multimodal Reasoning through Explicit Visual Anchoring
Chuming Shen
Wei Wei
Xiaoye Qu
Yu Cheng
LRM
358
8
0
25 May 2025
Doc-CoB: Enhancing Multi-Modal Document Understanding with Visual Chain-of-Boxes Reasoning
Doc-CoB: Enhancing Multi-Modal Document Understanding with Visual Chain-of-Boxes Reasoning
Ye Mo
Zirui Shao
Kai Ye
Xianwei Mao
Bo Zhang
...
Gang Huang
Kehan Chen
Zhou Huan
Zixu Yan
Sheng Zhou
LRM
229
3
0
24 May 2025
Benchmarking Retrieval-Augmented Multimodal Generation for Document Question Answering
Benchmarking Retrieval-Augmented Multimodal Generation for Document Question Answering
Kuicai Dong
Yujing Chang
Shijie Huang
Yasheng Wang
Ruiming Tang
Yong Liu
212
8
0
22 May 2025
Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization
Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization
Kesen Zhao
B. Zhu
Qianru Sun
Hanwang Zhang
MLLMLRM
358
10
0
25 Apr 2025
VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents
VDocRAG: Retrieval-Augmented Generation over Visually-Rich DocumentsComputer Vision and Pattern Recognition (CVPR), 2025
Ryota Tanaka
Taichi Iki
Taku Hasegawa
Kyosuke Nishida
Kuniko Saito
Jun Suzuki
VLM
315
21
0
14 Apr 2025
Unchecked and Overlooked: Addressing the Checkbox Blind Spot in Large Language Models with CheckboxQA
Unchecked and Overlooked: Addressing the Checkbox Blind Spot in Large Language Models with CheckboxQAIEEE International Conference on Document Analysis and Recognition (ICDAR), 2025
M. Turski
Mateusz Chiliński
Łukasz Borchmann
206
0
0
14 Apr 2025
NoTeS-Bank: Benchmarking Neural Transcription and Search for Scientific Notes Understanding
NoTeS-Bank: Benchmarking Neural Transcription and Search for Scientific Notes Understanding
Aniket Pal
Sanket Biswas
Alloy Das
Ayush Lodh
Priyanka Banerjee
Soumitri Chattopadhyay
Dimosthenis Karatzas
Josep Lladós
C. V. Jawahar
VLM
162
0
0
12 Apr 2025
M-DocSum: Do LVLMs Genuinely Comprehend Interleaved Image-Text in Document Summarization?
M-DocSum: Do LVLMs Genuinely Comprehend Interleaved Image-Text in Document Summarization?
Haolong Yan
Kaijun Tan
Yeqing Shen
Xin Huang
Zheng Ge
Xiangyu Zhang
Si Li
Daxin Jiang
VLM
188
0
0
27 Mar 2025
Where is this coming from? Making groundedness count in the evaluation of Document VQA models
Where is this coming from? Making groundedness count in the evaluation of Document VQA modelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Armineh Nourbakhsh
Siddharth Parekh
Pranav Shetty
Zhao Jin
Sameena Shah
Carolyn Rose
241
2
0
24 Mar 2025
A Simple yet Effective Layout Token in Large Language Models for Document Understanding
A Simple yet Effective Layout Token in Large Language Models for Document UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025
Zhaoqing Zhu
Chuwei Luo
Zirui Shao
Feiyu Gao
Hangdi Xing
Qi Zheng
Ji Zhang
250
6
0
24 Mar 2025
MMCR: Benchmarking Cross-Source Reasoning in Scientific Papers
MMCR: Benchmarking Cross-Source Reasoning in Scientific Papers
Yang Tian
Zheng Lu
Mingqi Gao
Zheng Liu
Bo Zhao
LRM
318
2
0
21 Mar 2025
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
Zhangquan Chen
Xufang Luo
Dongsheng Li
OffRLLRM
401
21
0
10 Mar 2025
LiGT: Layout-infused Generative Transformer for Visual Question Answering on Vietnamese Receipts
LiGT: Layout-infused Generative Transformer for Visual Question Answering on Vietnamese ReceiptsInternational Journal on Document Analysis and Recognition (IJDAR), 2025
Thanh-Phong Le
Trung Le Chi Phan
Nghia Hieu Nguyen
Kiet Van Nguyen
ViT
257
2
0
26 Feb 2025
Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs
Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs
Gaye Colakoglu
Gürkan Solmaz
Jonathan Fürst
265
4
0
25 Feb 2025
Introducing Visual Perception Token into Multimodal Large Language Model
Introducing Visual Perception Token into Multimodal Large Language Model
Runpeng Yu
Xinyin Ma
Xinchao Wang
MLLMLRM
286
11
0
24 Feb 2025
Visual-RAG: Benchmarking Text-to-Image Retrieval Augmented Generation for Visual Knowledge Intensive Queries
Visual-RAG: Benchmarking Text-to-Image Retrieval Augmented Generation for Visual Knowledge Intensive Queries
Yin Wu
Quanyu Long
Jing Li
Jianfei Yu
Wenya Wang
VLM
239
9
0
23 Feb 2025
Handwritten Text Recognition: A Survey
Handwritten Text Recognition: A Survey
Carlos Garrido-Munoz
Antonio Ríos-Vila
Jorge Calvo-Zaragoza
287
5
0
12 Feb 2025
DocVLM: Make Your VLM an Efficient Reader
DocVLM: Make Your VLM an Efficient ReaderComputer Vision and Pattern Recognition (CVPR), 2024
Mor Shpigel Nacson
Aviad Aberdam
Roy Ganz
Elad Ben Avraham
Alona Golts
Yair Kittenplon
Shai Mazor
Ron Litman
VLM
565
10
0
11 Dec 2024
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page
  Multi-document Understanding
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding
Jaemin Cho
Debanjan Mahata
Ozan Irsoy
Yujie He
Joey Tianyi Zhou
VLM
255
47
0
07 Nov 2024
NeurIPS 2023 Competition: Privacy Preserving Federated Learning Document VQA
NeurIPS 2023 Competition: Privacy Preserving Federated Learning Document VQA
Marlon Tobaben
Mohamed Ali Souibgui
Rubèn Pérez Tito
Khanh Nguyen
Raouf Kerkouche
...
Josep Lladós
Ernest Valveny
Antti Honkela
Mario Fritz
Dimosthenis Karatzas
FedML
353
1
0
06 Nov 2024
SV-RAG: LoRA-Contextualizing Adaptation of MLLMs for Long Document Understanding
SV-RAG: LoRA-Contextualizing Adaptation of MLLMs for Long Document UnderstandingInternational Conference on Learning Representations (ICLR), 2024
Jian Chen
Ruiyi Zhang
Jiuxiang Gu
Tong Yu
Franck Dernoncourt
J. Gu
Ryan Rossi
Changyou Chen
Tong Sun
222
11
0
02 Nov 2024
SimpsonsVQA: Enhancing Inquiry-Based Learning with a Tailored Dataset
SimpsonsVQA: Enhancing Inquiry-Based Learning with a Tailored Dataset
Ngoc Dung Huynh
Mohamed Reda Bouadjenek
Sunil Aryal
Imran Razzak
Hakim Hacid
217
0
0
30 Oct 2024
MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained
  Visual Document Understanding
MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained Visual Document Understanding
Fengbin Zhu
Ziyang Liu
Xiang Yao Ng
Haohui Wu
Wenjie Wang
Fuli Feng
Chao Wang
Huanbo Luan
Tat-Seng Chua
VLM
181
10
0
25 Oct 2024
"What is the value of {templates}?" Rethinking Document Information
  Extraction Datasets for LLMs
"What is the value of {templates}?" Rethinking Document Information Extraction Datasets for LLMsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Ran Zmigrod
Pranav Shetty
Mathieu Sibue
Zhiqiang Ma
Armineh Nourbakhsh
Xiaomo Liu
Manuela Veloso
138
4
0
20 Oct 2024
Towards an Improved Metric for Evaluating Disentangled Representations
Towards an Improved Metric for Evaluating Disentangled Representations
Sahib Julka
Yashu Wang
Michael Granitzer
158
4
0
04 Oct 2024
Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks
Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks
Mengzhao Jia
Wenhao Yu
Kaixin Ma
Tianqing Fang
Z. Zhang
Siru Ouyang
Hongming Zhang
Meng Jiang
Dong Yu
VLM
285
11
0
02 Oct 2024
A Survey on Multimodal Benchmarks: In the Era of Large AI Models
A Survey on Multimodal Benchmarks: In the Era of Large AI Models
Lin Li
Guikun Chen
Hanrong Shi
Jun Xiao
Long Chen
335
23
0
21 Sep 2024
12
Next