ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2507.15882
188
0
v1v2 (latest)

Document Haystack: A Long Context Multimodal Image/Document Understanding Vision LLM Benchmark

18 July 2025
Goeric Huybrechts
S. Ronanki
Sai Muralidhar Jayanthi
Jack FitzGerald
Srinivasan Veeravanallur
    VLM
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)
Main:7 Pages
4 Figures
Bibliography:2 Pages
7 Tables
Abstract

The proliferation of multimodal Large Language Models has significantly advanced the ability to analyze and understand complex data inputs from different modalities. However, the processing of long documents remains under-explored, largely due to a lack of suitable benchmarks. To address this, we introduce Document Haystack, a comprehensive benchmark designed to evaluate the performance of Vision Language Models (VLMs) on long, visually complex documents. Document Haystack features documents ranging from 5 to 200 pages and strategically inserts pure text or multimodal text+image "needles" at various depths within the documents to challenge VLMs' retrieval capabilities. Comprising 400 document variants and a total of 8,250 questions, it is supported by an objective, automated evaluation framework. We detail the construction and characteristics of the Document Haystack dataset, present results from prominent VLMs and discuss potential research avenues in this area.

View on arXiv
Comments on this paper