ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.14607
48
2

Can Large Vision Language Models Read Maps Like a Human?

18 March 2025
Shuo Xing
Zezhou Sun
Shuangyu Xie
Kaiyuan Chen
Yanjia Huang
Yuping Wang
Jiachen Li
Dezhen Song
Zhengzhong Tu
ArXivPDFHTML
Abstract

In this paper, we introduce MapBench-the first dataset specifically designed for human-readable, pixel-based map-based outdoor navigation, curated from complex path finding scenarios. MapBench comprises over 1600 pixel space map path finding problems from 100 diverse maps. In MapBench, LVLMs generate language-based navigation instructions given a map image and a query with beginning and end landmarks. For each map, MapBench provides Map Space Scene Graph (MSSG) as an indexing data structure to convert between natural language and evaluate LVLM-generated results. We demonstrate that MapBench significantly challenges state-of-the-art LVLMs both zero-shot prompting and a Chain-of-Thought (CoT) augmented reasoning framework that decomposes map navigation into sequential cognitive processes. Our evaluation of both open-source and closed-source LVLMs underscores the substantial difficulty posed by MapBench, revealing critical limitations in their spatial reasoning and structured decision-making capabilities. We release all the code and dataset inthis https URL.

View on arXiv
@article{xing2025_2503.14607,
  title={ Can Large Vision Language Models Read Maps Like a Human? },
  author={ Shuo Xing and Zezhou Sun and Shuangyu Xie and Kaiyuan Chen and Yanjia Huang and Yuping Wang and Jiachen Li and Dezhen Song and Zhengzhong Tu },
  journal={arXiv preprint arXiv:2503.14607},
  year={ 2025 }
}
Comments on this paper