Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2305.08455
Cited By
v1
v2
v3 (latest)
Document Understanding Dataset and Evaluation (DUDE)
IEEE International Conference on Computer Vision (ICCV), 2023
15 May 2023
Jordy Van Landeghem
Rubèn Pérez Tito
Łukasz Borchmann
Michal Pietruszka
Pawel Józiak
Rafal Powalski
Dawid Jurkiewicz
Mickael Coustaty
Bertrand Ackaert
Ernest Valveny
Matthew Blaschko
Sien Moens
Tomasz Stanislawek
VGen
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (3 upvotes)
Papers citing
"Document Understanding Dataset and Evaluation (DUDE)"
31 / 81 papers shown
SV-RAG: LoRA-Contextualizing Adaptation of MLLMs for Long Document Understanding
International Conference on Learning Representations (ICLR), 2024
Jian Chen
Ruiyi Zhang
Jiuxiang Gu
Tong Yu
Franck Dernoncourt
J. Gu
Ryan Rossi
Changyou Chen
Tong Sun
259
0
0
02 Nov 2024
SimpsonsVQA: Enhancing Inquiry-Based Learning with a Tailored Dataset
Ngoc Dung Huynh
Mohamed Reda Bouadjenek
Sunil Aryal
Imran Razzak
Hakim Hacid
233
0
0
30 Oct 2024
MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained Visual Document Understanding
Fengbin Zhu
Ziyang Liu
Xiang Yao Ng
Haohui Wu
Wenjie Wang
Fuli Feng
Chao Wang
Huanbo Luan
Tat-Seng Chua
VLM
222
10
0
25 Oct 2024
"What is the value of {templates}?" Rethinking Document Information Extraction Datasets for LLMs
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Ran Zmigrod
Pranav Shetty
Mathieu Sibue
Zhiqiang Ma
Armineh Nourbakhsh
Xiaomo Liu
Manuela Veloso
158
4
0
20 Oct 2024
Towards an Improved Metric for Evaluating Disentangled Representations
Sahib Julka
Yashu Wang
Michael Granitzer
199
0
0
04 Oct 2024
Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks
Mengzhao Jia
Wenhao Yu
Kaixin Ma
Tianqing Fang
Z. Zhang
Siru Ouyang
Hongming Zhang
Meng Jiang
Dong Yu
VLM
357
12
0
02 Oct 2024
A Survey on Multimodal Benchmarks: In the Era of Large AI Models
Lin Li
Guikun Chen
Hanrong Shi
Jun Xiao
Long Chen
343
23
0
21 Sep 2024
NVLM: Open Frontier-Class Multimodal LLMs
Wenliang Dai
Nayeon Lee
Wei Ping
Zhuoling Yang
Zihan Liu
Jon Barker
Tuomas Rintamaki
Mohammad Shoeybi
Bryan Catanzaro
Ming-Yu Liu
MLLM
VLM
LRM
301
114
0
17 Sep 2024
WebQuest: A Benchmark for Multimodal QA on Web Page Sequences
Maria Wang
Srinivas Sunkara
Gilles Baechler
Jason Lin
Yun Zhu
Fedir Zubach
Lei Shu
Jindong Chen
LRM
LLMAG
302
12
0
06 Sep 2024
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Anwen Hu
Haiyang Xu
Liang Zhang
Jiabo Ye
Ming Yan
Ji Zhang
Qin Jin
Fei Huang
Jingren Zhou
VLM
378
79
0
05 Sep 2024
μgat: Improving Single-Page Document Parsing by Providing Multi-Page Context
Fabio Quattrini
Carmine Zaccagnino
Silvia Cascianelli
Laura Righi
Rita Cucchiara
176
3
0
28 Aug 2024
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
International Conference on Learning Representations (ICLR), 2024
Jiabo Ye
Haiyang Xu
Haowei Liu
Anwen Hu
Ming Yan
Qi Qian
Ji Zhang
Fei Huang
Jingren Zhou
MLLM
VLM
314
225
0
09 Aug 2024
Deep Learning based Visually Rich Document Content Understanding: A Survey
Muhammad Ali
Jean Lee
Salman Khan
Eduard Hovy
458
15
0
02 Aug 2024
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Haodong Duan
Xinyu Fang
Junming Yang
Xiangyu Zhao
Lin Chen
...
Yuhang Zang
Pan Zhang
Jiaqi Wang
Dahua Lin
Kai Chen
LM&MA
VLM
725
358
0
16 Jul 2024
MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations
Yubo Ma
Yuhang Zang
Liangyu Chen
Meiqi Chen
Yizhu Jiao
...
Liangming Pan
Yu-Gang Jiang
Jiaqi Wang
Yixin Cao
Aixin Sun
ELM
RALM
VLM
254
92
0
01 Jul 2024
Overcoming Common Flaws in the Evaluation of Selective Classification Systems
Jeremias Traub
Till J. Bungert
Carsten T. Lüth
Michael Baumgartner
Klaus H. Maier-Hein
Lena Maier-Hein
Paul F. Jaeger
255
11
0
01 Jul 2024
DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
Renqiu Xia
Song Mao
Xiangchao Yan
Hongbin Zhou
Bo Zhang
...
Yongwei Wang
Bin Wang
Junchi Yan
Fei Wu
Yu Qiao
252
24
0
17 Jun 2024
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications
Jordy Van Landeghem
Subhajit Maity
Ayan Banerjee
Matthew Blaschko
Marie-Francine Moens
Josep Lladós
Sanket Biswas
351
4
0
12 Jun 2024
StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond
Pengyuan Lyu
Yulin Li
Hao Zhou
Weihong Ma
Xingyu Wan
...
Liang Wu
Chengquan Zhang
Kun Yao
Errui Ding
Jingdong Wang
342
12
0
31 May 2024
Notes on Applicability of GPT-4 to Document Understanding
Lukasz Borchmann
VLM
247
7
0
28 May 2024
Federated Document Visual Question Answering: A Pilot Study
Khanh Nguyen
Dimosthenis Karatzas
FedML
315
0
0
10 May 2024
Bridging the Gap Between End-to-End and Two-Step Text Spotting
Mingxin Huang
Hongliang Li
Yuliang Liu
Xiang Bai
Lianwen Jin
213
11
0
06 Apr 2024
BuDDIE: A Business Document Dataset for Multi-task Information Extraction
Ran Zmigrod
Dongsheng Wang
Mathieu Sibue
Yulong Pei
Petr Babkin
...
Antony Papadimitriou
William Watson
Zhiqiang Ma
Armineh Nourbakhsh
Sameena Shah
213
7
0
05 Apr 2024
Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
Hao Shao
Shengju Qian
Han Xiao
Guanglu Song
Zhuofan Zong
Letian Wang
Yu Liu
Jiaming Song
VGen
LRM
MLLM
346
216
0
25 Mar 2024
ANLS* -- A Universal Document Processing Metric for Generative Large Language Models
David Peer
Philemon Schöpf
V. Nebendahl
A. Rietzler
Sebastian Stabinger
299
8
0
06 Feb 2024
Watermark Text Pattern Spotting in Document Images
Mateusz Krubiński
Stefan Matcovici
Diana Grigore
Daniel Voinea
A. Popa
WaLM
202
3
0
10 Jan 2024
GRAM: Global Reasoning for Multi-Page VQA
Tsachi Blau
Sharon Fogel
Roi Ronen
Alona Golts
Roy Ganz
Elad Ben Avraham
Aviad Aberdam
Shahar Tsiper
Ron Litman
230
21
0
07 Jan 2024
DocLLM: A layout-aware generative language model for multimodal document understanding
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Dongsheng Wang
Natraj Raman
Mathieu Sibue
Zhiqiang Ma
Petr Babkin
Simerjot Kaur
Yulong Pei
Armineh Nourbakhsh
Xiaomo Liu
VLM
276
106
0
31 Dec 2023
Privacy-Aware Document Visual Question Answering
IEEE International Conference on Document Analysis and Recognition (ICDAR), 2023
Rubèn Pérez Tito
Khanh Nguyen
Marlon Tobaben
Raouf Kerkouche
Mohamed Ali Souibgui
...
Lei Kang
Ernest Valveny
Antti Honkela
Mario Fritz
Dimosthenis Karatzas
219
16
0
15 Dec 2023
PDFTriage: Question Answering over Long, Structured Documents
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Jon Saad-Falcon
Joe Barrow
Alexa F. Siu
A. Nenkova
David Seunghyun Yoon
Ryan Rossi
Franck Dernoncourt
RALM
265
29
0
16 Sep 2023
Beyond Document Page Classification: Design, Datasets, and Challenges
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Jordy Van Landeghem
Sanket Biswas
Matthew B. Blaschko
Marie-Francine Moens
212
9
0
24 Aug 2023
Previous
1
2