v1v2v3 (latest)

Document Understanding Dataset and Evaluation (DUDE)

IEEE International Conference on Computer Vision (ICCV), 2023

15 May 2023

Matthew Blaschko

ArXiv (abs)PDF HTML HuggingFace (3 upvotes)

Papers citing "Document Understanding Dataset and Evaluation (DUDE)"

31 / 81 papers shown

SV-RAG: LoRA-Contextualizing Adaptation of MLLMs for Long Document UnderstandingInternational Conference on Learning Representations (ICLR), 2024

259

02 Nov 2024

SimpsonsVQA: Enhancing Inquiry-Based Learning with a Tailored Dataset

Ngoc Dung Huynh

Mohamed Reda Bouadjenek

Sunil Aryal

Imran Razzak

Hakim Hacid

233

30 Oct 2024

MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained Visual Document Understanding

222

25 Oct 2024

"What is the value of {templates}?" Rethinking Document Information Extraction Datasets for LLMsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

158

20 Oct 2024

Towards an Improved Metric for Evaluating Disentangled Representations

Sahib Julka

Yashu Wang

Michael Granitzer

199

04 Oct 2024

Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks

357

02 Oct 2024

A Survey on Multimodal Benchmarks: In the Era of Large AI Models

Lin Li

Guikun Chen

Hanrong Shi

Jun Xiao

Long Chen

343

21 Sep 2024

NVLM: Open Frontier-Class Multimodal LLMs

Wenliang Dai

Zihan Liu

301

114

17 Sep 2024

WebQuest: A Benchmark for Multimodal QA on Web Page Sequences

Jindong Chen

302

06 Sep 2024

mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document UnderstandingAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Anwen Hu

Haiyang Xu

Liang Zhang

Jiabo Ye

Ming Yan

Ji Zhang

Qin Jin

Fei Huang

Jingren Zhou

VLM

378

05 Sep 2024

μgat: Improving Single-Page Document Parsing by Providing Multi-Page Context

Fabio Quattrini

Carmine Zaccagnino

Silvia Cascianelli

Laura Righi

Rita Cucchiara

176

28 Aug 2024

mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language ModelsInternational Conference on Learning Representations (ICLR), 2024

Ming Yan

Fei Huang

Jingren Zhou

MLLM VLM

314

225

09 Aug 2024

Deep Learning based Visually Rich Document Content Understanding: A Survey

458

02 Aug 2024

VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

...

725

358

16 Jul 2024

MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations

...

Yu-Gang Jiang

Jiaqi Wang

Yixin Cao

Aixin Sun

ELM RALM VLM

254

01 Jul 2024

Overcoming Common Flaws in the Evaluation of Selective Classification Systems

Klaus H. Maier-Hein

255

01 Jul 2024

DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models

Renqiu Xia

Song Mao

Xiangchao Yan

Hongbin Zhou

Bo Zhang

...

Yongwei Wang

Bin Wang

Junchi Yan

Fei Wu

Yu Qiao

252

17 Jun 2024

DistilDoc: Knowledge Distillation for Visually-Rich Document Applications

Matthew Blaschko

351

12 Jun 2024

StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond

...

Errui Ding

Jingdong Wang

342

31 May 2024

Notes on Applicability of GPT-4 to Document Understanding

Lukasz Borchmann

VLM

247

28 May 2024

Federated Document Visual Question Answering: A Pilot Study

Khanh Nguyen

Dimosthenis Karatzas

FedML

315

10 May 2024

Bridging the Gap Between End-to-End and Two-Step Text Spotting

Mingxin Huang

Hongliang Li

Yuliang Liu

Xiang Bai

Lianwen Jin

213

06 Apr 2024

BuDDIE: A Business Document Dataset for Multi-task Information Extraction

...

213

05 Apr 2024

Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning

Han Xiao

346

216

25 Mar 2024

ANLS* -- A Universal Document Processing Metric for Generative Large Language Models

299

06 Feb 2024

Watermark Text Pattern Spotting in Document Images

Mateusz Krubiński

202

10 Jan 2024

GRAM: Global Reasoning for Multi-Page VQA

230

07 Jan 2024

DocLLM: A layout-aware generative language model for multimodal document understandingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

276

106

31 Dec 2023

Privacy-Aware Document Visual Question AnsweringIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023

...

219

15 Dec 2023

PDFTriage: Question Answering over Long, Structured DocumentsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

265

16 Sep 2023

Beyond Document Page Classification: Design, Datasets, and ChallengesIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

Jordy Van Landeghem

Sanket Biswas

Matthew B. Blaschko

Marie-Francine Moens

212

24 Aug 2023