Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2401.00988
Cited By
Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected Multi-Modal Large Models
2 January 2024
Xinpeng Ding
Jinahua Han
Hang Xu
Xiaodan Liang
Wei Zhang
Xiaomeng Li
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected Multi-Modal Large Models"
27 / 27 papers shown
Title
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving
Zongchuang Zhao
Haoyu Fu
Dingkang Liang
Xin Zhou
Dingyuan Zhang
Hongwei Xie
Bing Wang
Xiang Bai
MLLM
VLM
39
0
0
13 May 2025
PaMi-VDPO: Mitigating Video Hallucinations by Prompt-Aware Multi-Instance Video Preference Learning
Xinpeng Ding
K. Zhang
Jinahua Han
Lanqing Hong
Hang Xu
X. Li
MLLM
VLM
77
0
0
08 Apr 2025
NuGrounding: A Multi-View 3D Visual Grounding Framework in Autonomous Driving
Fuhao Li
Huan Jin
Bin-Bin Gao
Liaoyuan Fan
Lihui Jiang
Long Zeng
63
0
0
28 Mar 2025
Fine-Grained Evaluation of Large Vision-Language Models in Autonomous Driving
Yue Li
Meng Tian
Zhenyu Lin
Jiangtong Zhu
Dechang Zhu
Haiqiang Liu
Zining Wang
Yueyi Zhang
Zhiwei Xiong
Xinhai Zhao
CoGe
VLM
78
0
0
27 Mar 2025
ST-VLM: Kinematic Instruction Tuning for Spatio-Temporal Reasoning in Vision-Language Models
Dohwan Ko
S. Kim
Yumin Suh
Vijay Kumar B.G
Minseo Yoon
Manmohan Chandraker
Hyunwoo J. Kim
LRM
38
0
0
25 Mar 2025
AutoDrive-QA- Automated Generation of Multiple-Choice Questions for Autonomous Driving Datasets Using Large Vision-Language Models
Boshra Khalili
Andrew W.Smyth
ELM
54
0
0
20 Mar 2025
Tracking Meets Large Multimodal Models for Driving Scenario Understanding
Ayesha Ishaq
Jean Lahoud
F. Khan
Salman Khan
Hisham Cholakkal
Rao Muhammad Anwer
54
0
0
18 Mar 2025
NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models
Sung-Yeon Park
Can Cui
Yunsheng Ma
Ahmadreza Moradipari
Rohit Gupta
Kyungtae Han
Ziran Wang
34
0
0
17 Mar 2025
DynRsl-VLM: Enhancing Autonomous Driving Perception with Dynamic Resolution Vision-Language Models
Xirui Zhou
Lianlei Shan
Xiaolin Gui
53
0
0
14 Mar 2025
A Framework for a Capability-driven Evaluation of Scenario Understanding for Multimodal Large Language Models in Autonomous Driving
Tin Stribor Sohn
Philipp Reis
Maximilian Dillitzer
Johannes Bach
Jason J. Corso
Eric Sax
ELM
LRM
49
0
0
14 Mar 2025
How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game
Z. Wang
Yurui Dong
Fuwen Luo
Minyuan Ruan
Zhili Cheng
C. L. P. Chen
Peng Li
Yang Liu
LRM
79
0
0
13 Mar 2025
DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding
Ayesha Ishaq
Jean Lahoud
Ketan More
Omkar Thawakar
Ritesh Thawkar
...
F. Khan
Hisham Cholakkal
Ivan Laptev
Rao Muhammad Anwer
Salman Khan
LRM
59
0
0
13 Mar 2025
PGAD: Prototype-Guided Adaptive Distillation for Multi-Modal Learning in AD Diagnosis
Yanfei Li
Teng Yin
Wenyi Shang
J. Liu
Xi Wang
Kaiyang Zhao
MedIm
34
0
0
05 Mar 2025
BEVDriver: Leveraging BEV Maps in LLMs for Robust Closed-Loop Driving
Katharina Winter
Mark Azer
Fabian B. Flohr
53
0
0
05 Mar 2025
Visual Large Language Models for Generalized and Specialized Applications
Yifan Li
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
83
10
0
06 Jan 2025
Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models
Xin Zou
Yizhou Wang
Yibo Yan
Yuanhuiyi Lyu
Kening Zheng
...
Junkai Chen
Peijie Jiang
J. Liu
Chang Tang
Xuming Hu
78
7
0
04 Oct 2024
DualAD: Dual-Layer Planning for Reasoning in Autonomous Driving
Dingrui Wang
Marc Kaufeld
Johannes Betz
29
0
0
26 Sep 2024
Tokenize the World into Object-level Knowledge to Address Long-tail Events in Autonomous Driving
Ran Tian
Boyi Li
Xinshuo Weng
Yuxiao Chen
Edward Schmerling
Yue Wang
B. Ivanovic
Marco Pavone
29
13
0
01 Jul 2024
AD-H: Autonomous Driving with Hierarchical Agents
Zaibin Zhang
Shiyu Tang
Yuanhang Zhang
Talas Fu
Yifan Wang
Yang Liu
Dong Wang
Jing Shao
Lijun Wang
H. Lu
42
3
0
05 Jun 2024
OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning
Shihao Wang
Zhiding Yu
Xiaohui Jiang
Shiyi Lan
Min Shi
Nadine Chang
Jan Kautz
Ying Li
Jose M. Alvarez
LRM
31
47
0
02 May 2024
Graphic Design with Large Multimodal Model
Yutao Cheng
Zhao Zhang
Maoke Yang
Hui Nie
Chunyuan Li
Xinglong Wu
Jie Shao
36
10
0
22 Apr 2024
EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video Grounding with Multimodal Large Language Model
Guozhang Li
Xinpeng Ding
De-Chun Cheng
Jie Li
Nannan Wang
Xinbo Gao
17
1
0
05 Dec 2023
LLM4Drive: A Survey of Large Language Models for Autonomous Driving
Zhenjie Yang
Xiaosong Jia
Hongyang Li
Junchi Yan
ELM
24
92
0
02 Nov 2023
Boosting Weakly-Supervised Temporal Action Localization with Text Information
Guozhang Li
De-Chun Cheng
Xinpeng Ding
N. Wang
Xiaoyu Wang
Xinbo Gao
23
21
0
01 May 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
DRAMA: Joint Risk Localization and Captioning in Driving
Srikanth Malla
Chiho Choi
Isht Dwivedi
Joonhyang Choi
Jiachen Li
94
85
0
22 Sep 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
382
4,010
0
28 Jan 2022
1