ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.09795
40
0

VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents

14 April 2025
Ryota Tanaka
Taichi Iki
Taku Hasegawa
Kyosuke Nishida
Kuniko Saito
Jun Suzuki
    VLM
ArXivPDFHTML
Abstract

We aim to develop a retrieval-augmented generation (RAG) framework that answers questions over a corpus of visually-rich documents presented in mixed modalities (e.g., charts, tables) and diverse formats (e.g., PDF, PPTX). In this paper, we introduce a new RAG framework, VDocRAG, which can directly understand varied documents and modalities in a unified image format to prevent missing information that occurs by parsing documents to obtain text. To improve the performance, we propose novel self-supervised pre-training tasks that adapt large vision-language models for retrieval by compressing visual information into dense token representations while aligning them with textual content in documents. Furthermore, we introduce OpenDocVQA, the first unified collection of open-domain document visual question answering datasets, encompassing diverse document types and formats. OpenDocVQA provides a comprehensive resource for training and evaluating retrieval and question answering models on visually-rich documents in an open-domain setting. Experiments show that VDocRAG substantially outperforms conventional text-based RAG and has strong generalization capability, highlighting the potential of an effective RAG paradigm for real-world documents.

View on arXiv
@article{tanaka2025_2504.09795,
  title={ VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents },
  author={ Ryota Tanaka and Taichi Iki and Taku Hasegawa and Kyosuke Nishida and Kuniko Saito and Jun Suzuki },
  journal={arXiv preprint arXiv:2504.09795},
  year={ 2025 }
}
Comments on this paper