ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.18675
  4. Cited By
Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps

Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps

24 May 2025
Sicheng Feng
Song Wang
Shuyi Ouyang
Lingdong Kong
Zikai Song
Jianke Zhu
Huan Wang
Xinchao Wang
    LRM
ArXivPDFHTML

Papers citing "Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps"

48 / 48 papers shown
Title
Looking Beyond Language Priors: Enhancing Visual Comprehension and Attention in Multimodal Models
Looking Beyond Language Priors: Enhancing Visual Comprehension and Attention in Multimodal Models
Aarti Ghatkesar
Uddeshya Upadhyay
Ganesh Venkatesh
VLM
48
1
0
08 May 2025
R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
Meng-Hao Guo
Jiajun Xu
Yi Zhang
Jiaxi Song
Haoyang Peng
...
Yongming Rao
Houwen Peng
Han Hu
Gordon Wetzstein
Shi-Min Hu
ELM
LRM
74
3
0
04 May 2025
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Yiping Wang
Qing Yang
Zhiyuan Zeng
Liliang Ren
Liu Liu
...
Jianfeng Gao
Weizhu Chen
Shuaiqiang Wang
Simon Shaolei Du
Yelong Shen
OffRL
ReLM
LRM
214
23
0
29 Apr 2025
Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning
Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning
Chris
Yichen Wei
Yi Peng
Xiang Wang
Weijie Qiu
...
Jianhao Zhang
Y. Hao
Xuchen Song
Yang Liu
Yahui Zhou
OffRL
AI4TS
SyDa
LRM
VLM
96
4
0
23 Apr 2025
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models
Weiye Xu
Jun Wang
Weiyun Wang
Zhe Chen
Wengang Zhou
...
Xiaohua Wang
Xizhou Zhu
Wenhai Wang
Jifeng Dai
Jinguo Zhu
VLM
LRM
114
5
0
21 Apr 2025
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Yang Yue
Zhiqi Chen
Rui Lu
Andrew Zhao
Zhaokai Wang
Yang Yue
Shiji Song
Gao Huang
ReLM
LRM
114
55
0
18 Apr 2025
Instruction-augmented Multimodal Alignment for Image-Text and Element Matching
Instruction-augmented Multimodal Alignment for Image-Text and Element Matching
Xinli Yue
Jianhui Sun
Junda Lu
Liangchao Yao
Fan Xia
Tianyi Wang
Fengyun Rao
Jing Lyu
Yuetang Deng
66
2
0
16 Apr 2025
Efficient Reasoning Models: A Survey
Efficient Reasoning Models: A Survey
Sicheng Feng
Gongfan Fang
Xinyin Ma
Xinchao Wang
ReLM
LRM
284
10
0
15 Apr 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu
Weiyun Wang
Zhe Chen
Ziwei Liu
Shenglong Ye
...
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
Wei Wang
MLLM
VLM
104
56
1
14 Apr 2025
GeoNav: Empowering MLLMs with Explicit Geospatial Reasoning Abilities for Language-Goal Aerial Navigation
GeoNav: Empowering MLLMs with Explicit Geospatial Reasoning Abilities for Language-Goal Aerial Navigation
Haotian Xu
Yue Hu
Chen Gao
Zhengqiu Zhu
Yong Zhao
Yongqian Li
Quanjun Yin
78
2
0
13 Apr 2025
SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding
SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding
Yangliu Hu
Zikai Song
Na Feng
Yawei Luo
Junqing Yu
Yi-Ping Phoebe Chen
Wei Yang
51
1
0
10 Apr 2025
Kimi-VL Technical Report
Kimi-VL Technical Report
Kimi Team
Angang Du
B. Yin
Bowei Xing
Bowen Qu
...
Zhiqi Huang
Zihao Huang
Zijia Zhao
Zhe Chen
Zongyu Lin
MLLM
VLM
MoE
280
14
0
10 Apr 2025
Visual-RFT: Visual Reinforcement Fine-Tuning
Ziyu Liu
Zeyi Sun
Yuhang Zang
Xiaoyi Dong
Yuhang Cao
Haodong Duan
Dahua Lin
Jiaqi Wang
ObjD
VLM
LRM
112
76
0
03 Mar 2025
Med-RLVR: Emerging Medical Reasoning from a 3B base model via reinforcement Learning
Med-RLVR: Emerging Medical Reasoning from a 3B base model via reinforcement Learning
Sheng Zhang
Qianchu Liu
Guanghui Qin
Tristan Naumann
Hoifung Poon
ReLM
OffRL
LRM
107
5
0
27 Feb 2025
Qwen2.5-VL Technical Report
Qwen2.5-VL Technical Report
S. Bai
Keqin Chen
Xuejing Liu
Jialin Wang
Wenbin Ge
...
Zesen Cheng
Hang Zhang
Zhibo Yang
Haiyang Xu
Junyang Lin
VLM
157
430
0
20 Feb 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
268
1,503
0
22 Jan 2025
Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark
Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark
Yunzhuo Hao
Jiawei Gu
Huichen Will Wang
Linjie Li
Zhiyong Yang
Lijuan Wang
Yu Cheng
LRM
76
30
0
10 Jan 2025
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
Shaoyuan Xie
Lingdong Kong
Yuhao Dong
Chonghao Sima
Wenwei Zhang
Qi Alfred Chen
Ziwei Liu
Liang Pan
283
16
0
08 Jan 2025
MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models
Mahir Labib Dihan
Md Tanvir Hassan
Md Tanvir Parvez
Md Hasebul Hasan
Md Almash Alam
Muhammad Aamir Cheema
Mohammed Eunus Ali
Md. Rizwan Parvez
ELM
LRM
91
2
0
31 Dec 2024
Visual-Linguistic Agent: Towards Collaborative Contextual Object
  Reasoning
Visual-Linguistic Agent: Towards Collaborative Contextual Object Reasoning
Jingru Yang
Huan Yu
Yang Jingxin
C. Xu
Yin Biao
Yu Sun
Shengfeng He
36
1
0
15 Nov 2024
An Empirical Analysis on Spatial Reasoning Capabilities of Large
  Multimodal Models
An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models
Fatemeh Shiri
Xiao-Yu Guo
Mona Golestan Far
Xin-Yao Yu
Gholamreza Haffari
Yuan-Fang Li
LRM
49
15
0
09 Nov 2024
Understanding the Role of LLMs in Multimodal Evaluation Benchmarks
Understanding the Role of LLMs in Multimodal Evaluation Benchmarks
Botian Jiang
Lei Li
Xiaonan Li
Zhaowei Li
Xiachong Feng
Dianbo Sui
Qiang Liu
Xipeng Qiu
65
3
0
16 Oct 2024
MathGLM-Vision: Solving Mathematical Problems with Multi-Modal Large
  Language Model
MathGLM-Vision: Solving Mathematical Problems with Multi-Modal Large Language Model
Zhen Yang
Jinhao Chen
Zhengxiao Du
Wenmeng Yu
Weihan Wang
Wenyi Hong
Zhihuan Jiang
Bin Xu
Yuxiao Dong
Jie Tang
VLM
LRM
48
10
0
10 Sep 2024
SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring
  Expression Segmentation
SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation
Yi-Chia Chen
Wei-Hua Li
Cheng Sun
Yu-Chiang Frank Wang
Chu-Song Chen
VLM
68
16
0
01 Sep 2024
Perceive, Reflect, and Plan: Designing LLM Agent for Goal-Directed City
  Navigation without Instructions
Perceive, Reflect, and Plan: Designing LLM Agent for Goal-Directed City Navigation without Instructions
Qingbin Zeng
Qinglong Yang
Shunan Dong
Heming Du
Liang Zheng
Fengli Xu
Yong Li
LLMAG
LM&Ro
67
12
0
08 Aug 2024
Qwen2 Technical Report
Qwen2 Technical Report
An Yang
Baosong Yang
Binyuan Hui
Jian Xu
Bowen Yu
...
Yuqiong Liu
Zeyu Cui
Zhenru Zhang
Zhifang Guo
Zhi-Wei Fan
OSLM
VLM
MU
90
875
0
15 Jul 2024
TokenPacker: Efficient Visual Projector for Multimodal LLM
TokenPacker: Efficient Visual Projector for Multimodal LLM
Wentong Li
Yuqian Yuan
Jian Liu
Dongqi Tang
Song Wang
Jie Qin
Jianke Zhu
Lei Zhang
MLLM
55
57
0
02 Jul 2024
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and
  Understanding
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding
Tao Zhang
Xiangtai Li
Hao Fei
Haobo Yuan
Shengqiong Wu
Shunping Ji
Chen Change Loy
Shuicheng Yan
LRM
MLLM
VLM
74
56
0
27 Jun 2024
CityBench: Evaluating the Capabilities of Large Language Models for Urban Tasks
CityBench: Evaluating the Capabilities of Large Language Models for Urban Tasks
Jie Feng
Jun Zhang
Junbo Yan
Xin Zhang
Tianjian Ouyang
Junbo Yan
Tianhui Liu
Siqi Guo
Yong Li
ELM
115
4
0
20 Jun 2024
PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle
  Motion Planning
PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning
Yupeng Zheng
Zebin Xing
Qichao Zhang
Bu Jin
Pengfei Li
...
Zhongpu Xia
Kun Zhan
Xianpeng Lang
Yaran Chen
Dongbin Zhao
LM&Ro
LRM
LLMAG
83
20
0
03 Jun 2024
Hallucination of Multimodal Large Language Models: A Survey
Hallucination of Multimodal Large Language Models: A Survey
Zechen Bai
Pichao Wang
Tianjun Xiao
Tong He
Zongbo Han
Zheng Zhang
Mike Zheng Shou
VLM
LRM
136
167
0
29 Apr 2024
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual
  Math Problems?
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Renrui Zhang
Dongzhi Jiang
Yichi Zhang
Haokun Lin
Ziyu Guo
...
Aojun Zhou
Pan Lu
Kai-Wei Chang
Peng Gao
Hongsheng Li
50
205
0
21 Mar 2024
Rotary Position Embedding for Vision Transformer
Rotary Position Embedding for Vision Transformer
Byeongho Heo
Song Park
Dongyoon Han
Sangdoo Yun
86
41
0
20 Mar 2024
Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset
Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset
Ke Wang
Junting Pan
Weikang Shi
Zimu Lu
Mingjie Zhan
Hongsheng Li
46
140
0
22 Feb 2024
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open
  Language Models
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao
Peiyi Wang
Qihao Zhu
Runxin Xu
Jun-Mei Song
...
Haowei Zhang
Mingchuan Zhang
Yiming Li
Yu-Huan Wu
Daya Guo
ReLM
LRM
91
953
0
05 Feb 2024
V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
Penghao Wu
Saining Xie
LRM
73
143
0
21 Dec 2023
PixelLM: Pixel Reasoning with Large Multimodal Model
PixelLM: Pixel Reasoning with Large Multimodal Model
Zhongwei Ren
Zhicheng Huang
Yunchao Wei
Yao-Min Zhao
Dongmei Fu
Jiashi Feng
Xiaojie Jin
VLM
MLLM
LRM
52
98
0
04 Dec 2023
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning
  Benchmark for Expert AGI
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Xiang Yue
Yuansheng Ni
Kai Zhang
Tianyu Zheng
Ruoqi Liu
...
Yibo Liu
Wenhao Huang
Huan Sun
Yu-Chuan Su
Wenhu Chen
OSLM
ELM
VLM
153
833
0
27 Nov 2023
A Survey on Multimodal Large Language Models for Autonomous Driving
A Survey on Multimodal Large Language Models for Autonomous Driving
Can Cui
Yunsheng Ma
Xu Cao
Wenqian Ye
Yang Zhou
...
Xinrui Yan
Shuqi Mei
Jianguo Cao
Ziran Wang
Chao Zheng
76
271
0
21 Nov 2023
A Survey on Large Language Model based Autonomous Agents
A Survey on Large Language Model based Autonomous Agents
Lei Wang
Chengbang Ma
Xueyang Feng
Zeyu Zhang
Hao-ran Yang
...
Xu Chen
Yankai Lin
Wayne Xin Zhao
Zhewei Wei
Ji-Rong Wen
LLMAG
AI4CE
LM&Ro
68
1,225
0
22 Aug 2023
Kosmos-2: Grounding Multimodal Large Language Models to the World
Kosmos-2: Grounding Multimodal Large Language Models to the World
Zhiliang Peng
Wenhui Wang
Li Dong
Y. Hao
Shaohan Huang
Shuming Ma
Furu Wei
MLLM
ObjD
VLM
89
735
0
26 Jun 2023
What You See is What You Read? Improving Text-Image Alignment Evaluation
What You See is What You Read? Improving Text-Image Alignment Evaluation
Michal Yarom
Yonatan Bitton
Soravit Changpinyo
Roee Aharoni
Jonathan Herzig
Oran Lang
E. Ofek
Idan Szpektor
EGVM
76
80
0
17 May 2023
LiDAR2Map: In Defense of LiDAR-Based Semantic Map Construction Using
  Online Camera Distillation
LiDAR2Map: In Defense of LiDAR-Based Semantic Map Construction Using Online Camera Distillation
Song Wang
Wentong Li
Wenyu Liu
Xiaolu Liu
Jianke Zhu
61
18
0
22 Apr 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
797
13,788
0
15 Mar 2023
A Survey of Embodied AI: From Simulators to Research Tasks
A Survey of Embodied AI: From Simulators to Research Tasks
Jiafei Duan
Samson Yu
Tangyao Li
Huaiyu Zhu
Cheston Tan
LM&Ro
55
280
0
08 Mar 2021
Measuring Massive Multitask Language Understanding
Measuring Massive Multitask Language Understanding
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
D. Song
Jacob Steinhardt
ELM
RALM
146
4,222
0
07 Sep 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
476
4,662
0
23 Jan 2020
Learning from Maps: Visual Common Sense for Autonomous Driving
Learning from Maps: Visual Common Sense for Autonomous Driving
Ari Seff
Jianxiong Xiao
SSL
32
45
0
25 Nov 2016
1