Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.12320
Cited By
A Survey on Multimodal Large Language Models for Autonomous Driving
21 November 2023
Can Cui
Yunsheng Ma
Xu Cao
Wenqian Ye
Yang Zhou
Kaizhao Liang
Jintai Chen
Juanwu Lu
Zichong Yang
Kuei-Da Liao
Tianren Gao
Erlong Li
Kun Tang
Zhipeng Cao
Tongxi Zhou
Ao Liu
Xinrui Yan
Shuqi Mei
Jianguo Cao
Ziran Wang
Chao Zheng
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Survey on Multimodal Large Language Models for Autonomous Driving"
50 / 179 papers shown
Title
Towards Human-Centric Autonomous Driving: A Fast-Slow Architecture Integrating Large Language Model Guidance with Reinforcement Learning
Chengkai Xu
Jiaqi Liu
Yicheng Guo
Y. Zhang
Peng Hang
Jian-jun Sun
21
0
0
11 May 2025
Batch Augmentation with Unimodal Fine-tuning for Multimodal Learning
H. M. D. Kabir
S. Mondal
Mohammad Ali Moni
19
0
0
10 May 2025
X-Driver: Explainable Autonomous Driving with Vision-Language Models
Wei Liu
J. A. Zhang
Binxiong Zheng
Yufeng Hu
Yingzhan Lin
Zengfeng Zeng
VLM
LRM
48
0
0
08 May 2025
LVLM-MPC Collaboration for Autonomous Driving: A Safety-Aware and Task-Scalable Control Architecture
Kazuki Atsuta
Kohei Honda
H. Okuda
Tatsuya Suzuki
59
0
0
08 May 2025
Adaptive Stress Testing Black-Box LLM Planners
Neeloy Chakraborty
John Pohovey
Melkior Ornik
Katherine Driggs-Campbell
23
0
0
08 May 2025
Task-Oriented Semantic Communication in Large Multimodal Models-based Vehicle Networks
Baoxia Du
H. Du
Dusit Niyato
Ruidong Li
51
0
0
05 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
X. Zhang
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
57
0
0
05 May 2025
Vision and Intention Boost Large Language Model in Long-Term Action Anticipation
Congqi Cao
Lanshu Hu
Yating Yu
Y. Zhang
VLM
49
0
0
03 May 2025
Towards Explainable AI: Multi-Modal Transformer for Video-based Image Description Generation
Lakshita Agarwal
Bindu Verma
ViT
22
0
0
23 Apr 2025
Efficient Reasoning Models: A Survey
Sicheng Feng
Gongfan Fang
Xinyin Ma
Xinchao Wang
ReLM
LRM
55
0
0
15 Apr 2025
ReasonDrive: Efficient Visual Question Answering for Autonomous Vehicles with Reasoning-Enhanced Small Vision-Language Models
Amirhosein Chahe
Lifeng Zhou
LRM
33
0
0
14 Apr 2025
Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models
Wei Chen
Xin Yan
Bin Wen
Fan Yang
Tingting Gao
Di Zhang
Long Chen
MLLM
87
0
0
09 Apr 2025
Zeus: Zero-shot LLM Instruction for Union Segmentation in Multimodal Medical Imaging
Siyuan Dai
Kai Ye
Guodong Liu
Haoteng Tang
Liang Zhan
MedIm
19
0
0
09 Apr 2025
Multimodal Agricultural Agent Architecture (MA3): A New Paradigm for Intelligent Agricultural Decision-Making
Zhuoning Xu
Jian Xu
M. Zhang
P. Wang
Chao Deng
Cheng-Lin Liu
26
0
0
07 Apr 2025
A Survey of Large Language Models in Mental Health Disorder Detection on Social Media
Zhuohan Ge
Nicole Hu
Darian Li
Yubo Wang
Shihao Qi
Yuming Xu
Han Shi
J. Zhang
AI4MH
54
0
0
03 Apr 2025
Multifaceted Evaluation of Audio-Visual Capability for MLLMs: Effectiveness, Efficiency, Generalizability and Robustness
Yusheng Zhao
Junyu Luo
Xiao Luo
Weizhi Zhang
Zhiping Xiao
Wei Ju
Philip S. Yu
Ming Zhang
AuLLM
37
0
0
03 Apr 2025
AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization
Chaohu Liu
Tianyi Gui
Yu Liu
Linli Xu
VLM
AAML
68
1
0
02 Apr 2025
A Survey of Reinforcement Learning-Based Motion Planning for Autonomous Driving: Lessons Learned from a Driving Task Perspective
Zhuoren Li
Guizhe Jin
Ran Yu
Z. Chen
Nan I. Li
...
Lu Xiong
Bo Leng
Jia Hu
I. Kolmanovsky
Dimitar Filev
44
0
0
31 Mar 2025
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
W. Zhang
Mengna Wang
Gangao Liu
Xu Huixin
Yiwei Jiang
...
Hang Zhang
Xin Li
Weiming Lu
Peng Li
Y. Zhuang
LM&Ro
LRM
65
2
0
27 Mar 2025
ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving Systems
Chenxi Wang
Jizhan Fang
Xiang Chen
Bozhong Tian
Ziwen Xu
H. Chen
N. Zhang
KELM
92
0
0
26 Mar 2025
DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning
Fucai Ke
Vijay Kumar B G
Xingjian Leng
Zhixi Cai
Zaid Khan
Weiqing Wang
P. D. Haghighi
H. Rezatofighi
Manmohan Chandraker
42
0
0
25 Mar 2025
Predicting the Road Ahead: A Knowledge Graph based Foundation Model for Scene Understanding in Autonomous Driving
Hongkuan Zhou
Stefan Schmid
Yicong Li
Lavdim Halilaj
Xiangtong Yao
Wei Cao
52
0
0
24 Mar 2025
A Novel Hat-Shaped Device-Cloud Collaborative Inference Framework for Large Language Models
Zuan Xie
Yang Xu
Hongli Xu
Yunming Liao
Zhiwei Yao
49
0
0
23 Mar 2025
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
Yang Sui
Yu-Neng Chuang
Guanchu Wang
Jiamu Zhang
Tianyi Zhang
...
Hongyi Liu
Andrew Wen
Shaochen
Zhong
Hanjie Chen
OffRL
ReLM
LRM
62
21
0
20 Mar 2025
Aligning Multimodal LLM with Human Preference: A Survey
Tao Yu
Y. Zhang
Chaoyou Fu
Junkang Wu
Jinda Lu
...
Qingsong Wen
Z. Zhang
Yan Huang
Liang Wang
T. Tan
73
2
0
18 Mar 2025
Task-Oriented Feature Compression for Multimodal Understanding via Device-Edge Co-Inference
Cheng Yuan
Z. Liu
Jiashu Lv
Jiawei Shao
Yufei Jiang
J. Zhang
Xuelong Li
43
0
0
17 Mar 2025
3DAxisPrompt: Promoting the 3D Grounding and Reasoning in GPT-4o
Dingning Liu
Cheng Wang
Peng Gao
Renrui Zhang
Xinzhu Ma
Yuan Meng
Zhihui Wang
LRM
39
0
0
17 Mar 2025
O-TPT: Orthogonality Constraints for Calibrating Test-time Prompt Tuning in Vision-Language Models
Ashshak Sharifdeen
Muhammad Akhtar Munir
Sanoojan Baliah
Salman Khan
M. H. Khan
VLM
47
0
0
15 Mar 2025
A Framework for a Capability-driven Evaluation of Scenario Understanding for Multimodal Large Language Models in Autonomous Driving
Tin Stribor Sohn
Philipp Reis
Maximilian Dillitzer
Johannes Bach
Jason J. Corso
Eric Sax
ELM
LRM
49
0
0
14 Mar 2025
Attention Reallocation: Towards Zero-cost and Controllable Hallucination Mitigation of MLLMs
Chongjun Tu
Peng Ye
Dongzhan Zhou
Lei Bai
Gang Yu
Tao Chen
Wanli Ouyang
56
0
0
13 Mar 2025
Talk2PC: Enhancing 3D Visual Grounding through LiDAR and Radar Point Clouds Fusion for Autonomous Driving
Runwei Guan
Jianan Liu
Ningwei Ouyang
Daizong Liu
Xiaolou Sun
Lianqing Zheng
Ming Xu
Yutao Yue
Hui Xiong
61
1
0
11 Mar 2025
A Cascading Cooperative Multi-agent Framework for On-ramp Merging Control Integrating Large Language Models
Miao Zhang
Zhenlong Fang
Tianyi Wang
Q. Zhang
Shuai Lu
Junfeng Jiao
Tianyu Shi
AI4CE
53
4
0
11 Mar 2025
DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance
Zhao Yang
Zezhong Qian
Xiaofan Li
Weixiang Xu
Gongpeng Zhao
Ruohong Yu
Lingsi Zhu
Longjun Liu
DiffM
VGen
61
1
0
05 Mar 2025
MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts
P. Wang
Zhongzhi Li
Fei Yin
Dekang Ran
Chenglin Liu
Cheng-Lin Liu
LRM
42
3
0
28 Feb 2025
HazardNet: A Small-Scale Vision Language Model for Real-Time Traffic Safety Detection at Edge Devices
M. Tami
Mohammed Elhenawy
Huthaifa I. Ashqar
36
0
0
27 Feb 2025
Multi-Agent Autonomous Driving Systems with Large Language Models: A Survey of Recent Advances
Yaozu Wu
Dongyuan Li
Yankai Chen
Renhe Jiang
Henry Peng Zou
Liancheng Fang
Zhen Wang
Philip S. Yu
LLMAG
66
1
0
24 Feb 2025
CurricuVLM: Towards Safe Autonomous Driving via Personalized Safety-Critical Curriculum Learning with Vision-Language Models
Zihao Sheng
Zilin Huang
Yansong Qu
Yue Leng
Sruthi Bhavanam
Sikai Chen
42
2
0
24 Feb 2025
MQADet: A Plug-and-Play Paradigm for Enhancing Open-Vocabulary Object Detection via Multimodal Question Answering
Caixiong Li
Xiongwei Zhao
Jinhang Zhang
Xing Zhang
Qihao Sun
Zhou Wu
ObjD
MLLM
VLM
51
0
0
23 Feb 2025
TeLL-Drive: Enhancing Autonomous Driving with Teacher LLM-Guided Deep Reinforcement Learning
Chengkai Xu
Jiaqi Liu
Shiyu Fang
Jian-jun Sun
Dong Chen
Peng Hang
Jian Sun
81
0
0
21 Feb 2025
Large Language Models for Autonomous Driving (LLM4AD): Concept, Benchmark, Experiments, and Challenges
Can Cui
Yunsheng Ma
Zichong Yang
Yupeng Zhou
Peiran Liu
...
Jitesh Panchal
Amr Abdelraouf
Rohit Gupta
Kyungtae Han
Ziran Wang
44
1
0
21 Feb 2025
Distraction is All You Need for Multimodal Large Language Model Jailbreaking
Zuopeng Yang
Jiluan Fan
Anli Yan
Erdun Gao
Xin Lin
Tao Li
Kanghua mo
Changyu Dong
AAML
70
0
0
15 Feb 2025
SimBEV: A Synthetic Multi-Task Multi-Sensor Driving Data Generation Tool and Dataset
Goodarz Mehr
A. Eskandarian
61
1
0
04 Feb 2025
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking
Jinyang Wu
Mingkuan Feng
Shuai Zhang
Ruihan Jin
Feihu Che
Zengqi Wen
J. Tao
LRM
57
7
0
04 Feb 2025
A Hybrid Swarm Intelligence Approach for Optimizing Multimodal Large Language Models Deployment in Edge-Cloud-based Federated Learning Environments
Gaith Rjouba
Hanae Elmekki
Saidul Islam
Jamal Bentahar
Rachida Dssouli
36
0
0
04 Feb 2025
Position: Empowering Time Series Reasoning with Multimodal LLMs
Yaxuan Kong
Yiyuan Yang
Shiyu Wang
Chenghao Liu
Yuxuan Liang
Ming Jin
Stefan Zohren
Dan Pei
Y. Liu
Qingsong Wen
AI4TS
LRM
66
2
0
03 Feb 2025
Neuro-LIFT: A Neuromorphic, LLM-based Interactive Framework for Autonomous Drone FlighT at the Edge
Amogh Joshi
Sourav Sanyal
Kaushik Roy
61
2
0
31 Jan 2025
E2E-MFD: Towards End-to-End Synchronous Multimodal Fusion Detection
Jiaqing Zhang
Mingxiang Cao
Weiying Xie
Jie Lei
Daixun Li
Wenbo Huang
Yunsong Li
Xue Yang
43
4
0
28 Jan 2025
EDoRA: Efficient Weight-Decomposed Low-Rank Adaptation via Singular Value Decomposition
Hamid Nasiri
Peter Garraghan
36
1
0
21 Jan 2025
RALAD: Bridging the Real-to-Sim Domain Gap in Autonomous Driving with Retrieval-Augmented Learning
Jiacheng Zuo
Haibo Hu
Zikang Zhou
Yufei Cui
Ziquan Liu
Jianping Wang
Nan Guan
Jin Wang
Chun Jason Xue
58
0
0
21 Jan 2025
When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysis
Ruixuan Zhang
Beichen Wang
Juexiao Zhang
Zilin Bian
Chen Feng
K. Ozbay
36
2
0
17 Jan 2025
1
2
3
4
Next