ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.00785
29
0

GeoChain: Multimodal Chain-of-Thought for Geographic Reasoning

1 June 2025
Sahiti Yerramilli
Nilay Pande
Rynaa Grover
Jayant Sravan Tamarapalli
    LRMAI4CE
ArXiv (abs)PDFHTML
Main:7 Pages
8 Figures
Bibliography:2 Pages
9 Tables
Appendix:5 Pages
Abstract

This paper introduces GeoChain, a large-scale benchmark for evaluating step-by-step geographic reasoning in multimodal large language models (MLLMs). Leveraging 1.46 million Mapillary street-level images, GeoChain pairs each image with a 21-step chain-of-thought (CoT) question sequence (over 30 million Q&A pairs). These sequences guide models from coarse attributes to fine-grained localization across four reasoning categories - visual, spatial, cultural, and precise geolocation - annotated by difficulty. Images are also enriched with semantic segmentation (150 classes) and a visual locatability score. Our benchmarking of contemporary MLLMs (GPT-4.1 variants, Claude 3.7, Gemini 2.5 variants) on a diverse 2,088-image subset reveals consistent challenges: models frequently exhibit weaknesses in visual grounding, display erratic reasoning, and struggle to achieve accurate localization, especially as the reasoning complexity escalates. GeoChain offers a robust diagnostic methodology, critical for fostering significant advancements in complex geographic reasoning within MLLMs.

View on arXiv
@article{yerramilli2025_2506.00785,
  title={ GeoChain: Multimodal Chain-of-Thought for Geographic Reasoning },
  author={ Sahiti Yerramilli and Nilay Pande and Rynaa Grover and Jayant Sravan Tamarapalli },
  journal={arXiv preprint arXiv:2506.00785},
  year={ 2025 }
}
Comments on this paper