Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2411.13112
Cited By
DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous Driving
20 November 2024
Xianda Guo
Ruijun Zhang
Yiqun Duan
Yuhang He
Chenming Zhang
Shuai Liu
Long Chen
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous Driving"
9 / 9 papers shown
Title
A Call for New Recipes to Enhance Spatial Reasoning in MLLMs
Huanyu Zhang
Chengzu Li
Wenshan Wu
Shaoguang Mao
Yan Xia
Ivan Vulić
Z. Zhang
Liang Wang
T. Tan
Furu Wei
LRM
34
1
0
21 Apr 2025
Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video Understanding
Tong Zeng
Longfeng Wu
Liang Shi
Dawei Zhou
Feng Guo
17
0
0
20 Apr 2025
NuScenes-SpatialQA: A Spatial Understanding and Reasoning Benchmark for Vision-Language Models in Autonomous Driving
Kexin Tian
Jingrui Mao
Y. Zhang
Jiwan Jiang
Yang Zhou
Zhengzhong Tu
CoGe
60
0
0
04 Apr 2025
LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning?
Kexian Tang
Junyao Gao
Yanhong Zeng
Haodong Duan
Yanan Sun
Zhening Xing
Wenran Liu
Kaifeng Lyu
Kai-xiang Chen
ELM
LRM
56
1
0
25 Mar 2025
Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open Space
Weichen Zhan
Zile Zhou
Zhiheng Zheng
Chen Gao
Jinqiang Cui
Y. Li
Xinlei Chen
Xiao-Ping Zhang
LRM
63
1
0
14 Mar 2025
A Framework for a Capability-driven Evaluation of Scenario Understanding for Multimodal Large Language Models in Autonomous Driving
Tin Stribor Sohn
Philipp Reis
Maximilian Dillitzer
Johannes Bach
Jason J. Corso
Eric Sax
ELM
LRM
49
0
0
14 Mar 2025
Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model
Wenke Huang
Jian Liang
Xianda Guo
Yiyang Fang
Guancheng Wan
...
Bin Yang
He Li
Jiawei Shao
Mang Ye
Bo Du
OffRL
LRM
MLLM
KELM
VLM
63
1
0
06 Mar 2025
BEVDriver: Leveraging BEV Maps in LLMs for Robust Closed-Loop Driving
Katharina Winter
Mark Azer
Fabian B. Flohr
53
0
0
05 Mar 2025
Embodied Scene Understanding for Vision Language Models via MetaVQA
Weizhen Wang
Chenda Duan
Zhenghao Peng
Yuxin Liu
Bolei Zhou
LM&Ro
44
0
0
17 Jan 2025
1