Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2302.00673
Cited By
ADAPT: Action-aware Driving Caption Transformer
IEEE International Conference on Robotics and Automation (ICRA), 2023
1 February 2023
Bu Jin
Xinyi Liu
Yupeng Zheng
Pengfei Li
Hao Zhao
Tong Zhang
Yuhang Zheng
Guyue Zhou
Jingjing Liu
Re-assign community
ArXiv (abs)
PDF
HTML
Github (398★)
Papers citing
"ADAPT: Action-aware Driving Caption Transformer"
50 / 63 papers shown
Title
OpenREAD: Reinforced Open-Ended Reasoning for End-to-End Autonomous Driving with LLM-as-Critic
Songyan Zhang
Wenhui Huang
Zhan Chen
Chua Jiahao Collister
Qihang Huang
Chen Lv
OffRL
LRM
104
1
0
01 Dec 2025
Evaluating Small Vision-Language Models on Distance-Dependent Traffic Perception
Nikos Theodoridis
Tim Brophy
Reenu Mohandas
Ganesh Sistu
Fiachra Collins
Anthony G. Scanlan
Ciarán Eising
VLM
LRM
116
1
0
09 Oct 2025
The Case for Negative Data: From Crash Reports to Counterfactuals for Reasonable Driving
Jay Patrikar
Apoorva Sharma
Sushant Veer
Boyi Li
Sebastian A. Scherer
Marco Pavone
92
0
0
23 Sep 2025
How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective
S. Yu
Yuxin Chen
Hao Ju
Lianjie Jia
Fuxi Zhang
...
Lin Song
Lijun Wang
Yanwei Li
Y. Shan
Huchuan Lu
LRM
285
8
0
23 Sep 2025
LMAD: Integrated End-to-End Vision-Language Model for Explainable Autonomous Driving
Nan Song
Bozhou Zhang
Xiatian Zhu
Jiankang Deng
Li Zhang
VLM
98
2
0
17 Aug 2025
BEV-LLM: Leveraging Multimodal BEV Maps for Scene Captioning in Autonomous Driving
Felix Brandstaetter
Erik Schuetz
Katharina Winter
Fabian B. Flohr
117
1
0
25 Jul 2025
LaViPlan : Language-Guided Visual Path Planning with RLVR
Hayeon Oh
VLM
315
0
0
17 Jul 2025
AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning
Zewei Zhou
Tianhui Cai
Seth Z. Zhao
Yun Zhang
Zhiyu Huang
Bolei Zhou
Jiaqi Ma
LRM
VLM
261
54
0
16 Jun 2025
Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models
Haohan Chi
Huan-ang Gao
Ziming Liu
Jianing Liu
Chenyu Liu
...
Leichen Wang
Xingtao Hu
Hao Sun
Hang Zhao
Hao Zhao
VLM
233
17
0
29 May 2025
Temporal Object Captioning for Street Scene Videos from LiDAR Tracks
Vignesh Gopinathan
Urs Zimmermann
Michael Arnold
Matthias Rottmann
174
0
0
22 May 2025
PADriver: Towards Personalized Autonomous Driving
Genghua Kou
Fan Jia
Weixin Mao
Wenshu Fan
Yucheng Zhao
Ziheng Zhang
Osamu Yoshie
Tiancai Wang
You Li
Xinming Zhang
305
2
0
08 May 2025
UncAD: Towards Safe End-to-end Autonomous Driving via Online Map Uncertainty
IEEE International Conference on Robotics and Automation (ICRA), 2025
Pengxuan Yang
Yupeng Zheng
Qichao Zhang
Kefei Zhu
Zebin Xing
Qiao Lin
Yun-Fu Liu
Zhiguo Su
Dongbin Zhao
178
9
0
17 Apr 2025
OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action Model
Xingcheng Zhou
Xuyuan Han
Feng Yang
Yunpu Ma
Volker Tresp
Alois C. Knoll
VLM
372
59
0
30 Mar 2025
Chameleon: Fast-slow Neuro-symbolic Lane Topology Extraction
IEEE International Conference on Robotics and Automation (ICRA), 2025
Zongzheng Zhang
Xinrun Li
Sizhe Zou
Guoxuan Chi
Siqi Li
...
Guoliang Wang
Guantian Zheng
Leichen Wang
Hang Zhao
Hao Zhao
333
9
0
10 Mar 2025
GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving
Computer Vision and Pattern Recognition (CVPR), 2025
Zebin Xing
Xinsong Zhang
Yang Hu
Bo Jiang
Tong He
Qian Zhang
Xiaoxiao Long
Wei Yin
464
45
0
07 Mar 2025
SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models
J.N. Zhang
Xuan Yang
Tianfu Wang
Yu Yao
Aleksandr Petiushko
B. Li
402
10
0
28 Feb 2025
DeepRTL: Bridging Verilog Understanding and Generation with a Unified Representation Model
International Conference on Learning Representations (ICLR), 2025
Yi Liu
Changran Xu
Yunhao Zhou
Zhiyu Li
Qiang Xu
VLM
294
17
0
20 Feb 2025
DriveLM: Driving with Graph Visual Question Answering
European Conference on Computer Vision (ECCV), 2023
Chonghao Sima
Katrin Renz
Kashyap Chitta
Lawrence Yunliang Chen
Hanxue Zhang
Chengen Xie
Jens Beißwenger
Ping Luo
Andreas Geiger
Hongyang Li
734
336
0
17 Jan 2025
Embodied Scene Understanding for Vision Language Models via MetaVQA
Computer Vision and Pattern Recognition (CVPR), 2025
Weizhen Wang
Chenda Duan
Zhenghao Peng
Yuxin Liu
Bolei Zhou
LM&Ro
247
7
0
17 Jan 2025
H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving
AAAI Conference on Artificial Intelligence (AAAI), 2025
Tian Jin
Yuxiao Luo
Yue Ma
Yu Qiao
Yali Wang
Mamba
238
5
0
08 Jan 2025
Explanation for Trajectory Planning using Multi-modal Large Language Model for Autonomous Driving
Shota Yamazaki
Chenyu Zhang
Takuya Nanri
Akio Shigekane
Siyuan Wang
Jo Nishiyama
Tao Chu
Kohei Yokosawa
LRM
213
1
0
15 Nov 2024
Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance
Zhangwei Gao
Zhe Chen
Erfei Cui
Yiming Ren
Weiyun Wang
...
Lewei Lu
Tong Lu
Yu Qiao
Jifeng Dai
Wenhai Wang
VLM
359
83
0
21 Oct 2024
Robust RL with LLM-Driven Data Synthesis and Policy Adaptation for Autonomous Driving
Sihao Wu
Jiaxu Liu
Xiangyu Yin
Guangliang Cheng
Xingyu Zhao
Meng Fang
Xinping Yi
Xiaowei Huang
278
4
0
16 Oct 2024
Dual-AEB: Synergizing Rule-Based and Multimodal Large Language Models for Effective Emergency Braking
IEEE International Conference on Robotics and Automation (ICRA), 2024
Wei Zhang
Pengfei Li
Junli Wang
Bo Shen
Qihao Jin
...
Shibo Rui
Yang Yu
Wenchao Ding
Peng Li
Yilun Chen
210
1
0
11 Oct 2024
Efficient Driving Behavior Narration and Reasoning on Edge Device Using Large Language Models
IEEE Transactions on Vehicular Technology (IEEE Trans. Veh. Technol.), 2024
Yizhou Huang
Yihua Cheng
Kezhi Wang
LRM
132
3
0
30 Sep 2024
KARMA: Augmenting Embodied AI Agents with Long-and-short Term Memory Systems
IEEE International Conference on Robotics and Automation (ICRA), 2024
Zixuan Wang
Bo Yu
Junzhe Zhao
Wenhao Sun
Sai Hou
Shuai Liang
Xing Hu
Yinhe Han
Yiming Gan
350
11
0
23 Sep 2024
Traffic Scene Generation from Natural Language Description for Autonomous Vehicles with Large Language Model
Bo-Kai Ruan
Hao-Tang Tsui
Yung-Hui Li
Hong-Han Shuai
LM&Ro
431
14
0
15 Sep 2024
MulCPred: Learning Multi-modal Concepts for Explainable Pedestrian Action Prediction
Italian National Conference on Sensors (INS), 2024
Yan Feng
Alexander Carballo
Keisuke Fujii
Robin Karlsson
Ming Ding
K. Takeda
136
0
0
14 Sep 2024
Hint-AD: Holistically Aligned Interpretability in End-to-End Autonomous Driving
Conference on Robot Learning (CoRL), 2024
Kairui Ding
Boyuan Chen
Yuchen Su
Huan-ang Gao
Bu Jin
...
Wuqiang Zhang
Xiaohui Li
Paul Barsch
Hongyang Li
Hang Zhao
196
19
0
10 Sep 2024
ChatSUMO: Large Language Model for Automating Traffic Scenario Generation in Simulation of Urban MObility
IEEE Transactions on Intelligent Vehicles (TIV), 2024
Shuyang Li
Talha Azfar
Ruimin Ke
LLMAG
215
32
0
29 Aug 2024
CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Hidehisa Arai
Keita Miwa
Kento Sasaki
Yu Yamaguchi
Kohei Watanabe
Shunsuke Aoki
Issei Yamamoto
274
46
0
19 Aug 2024
Multi-Frame Vision-Language Model for Long-form Reasoning in Driver Behavior Analysis
Hiroshi Takato
Hiroshi Tsutsui
Komei Soda
Hidetaka Kamigaito
VLM
225
2
0
03 Aug 2024
Large Language Models for Human-like Autonomous Driving: A Survey
Yun Li
Kai Katsumata
Ehsan Javanmardi
Manabu Tsukada
LM&MA
179
15
0
27 Jul 2024
Tell Me Where You Are: Multimodal LLMs Meet Place Recognition
Zonglin Lyu
Juexiao Zhang
Mingxuan Lu
Yiming Li
Chen Feng
181
9
0
25 Jun 2024
Do More Details Always Introduce More Hallucinations in LVLM-based Image Captioning?
Mingqian Feng
Yunlong Tang
Zeliang Zhang
Chenliang Xu
170
6
0
18 Jun 2024
DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences
Yidong Huang
Jacob Sansom
Ziqiao Ma
Felix Gervits
Joyce Chai
241
26
0
05 Jun 2024
PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning
Yupeng Zheng
Zebin Xing
Qichao Zhang
Bu Jin
Pengfei Li
...
Zhongpu Xia
Kun Zhan
Xianpeng Lang
Yaran Chen
Dongbin Zhao
LM&Ro
LRM
LLMAG
222
32
0
03 Jun 2024
Hard Cases Detection in Motion Prediction by Vision-Language Foundation Models
Yi Yang
Qingwen Zhang
Kei Ikemura
Nazre Batool
John Folkesson
VLM
167
3
0
31 May 2024
On the Utility of External Agent Intention Predictor for Human-AI Coordination
Adaptive Agents and Multi-Agent Systems (AAMAS), 2024
Chenxu Wang
Zilong Chen
Angelo Cangelosi
Huaping Liu
166
2
0
03 May 2024
Can Vehicle Motion Planning Generalize to Realistic Long-tail Scenarios?
Marcel Hallgarten
Julian Zapata
Martin Stoll
Katrin Renz
Andreas Zell
240
24
0
11 Apr 2024
Prompting Multi-Modal Tokens to Enhance End-to-End Autonomous Driving Imitation Learning with LLMs
Yiqun Duan
Qiang Zhang
Renjing Xu
295
18
0
07 Apr 2024
Idea-2-3D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs
Junhao Chen
Xiang Li
Xiaojun Ye
Chao Li
Zhaoxin Fan
Hao Zhao
VGen
3DV
356
6
0
05 Apr 2024
TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes
Bu Jin
Yupeng Zheng
Pengfei Li
Weize Li
Yuhang Zheng
...
Kun Zhan
Fu Liu
Xiaoxiao Long
Yilun Chen
Hao Zhao
3DV
244
36
0
28 Mar 2024
P-MapNet: Far-seeing Map Generator Enhanced by both SDMap and HDMap Priors
Zhou Jiang
Zhenxin Zhu
Pengfei Li
Huan-ang Gao
Tianyuan Yuan
Yongliang Shi
Hang Zhao
Hao Zhao
438
51
0
15 Mar 2024
MonoOcc: Digging into Monocular Semantic Occupancy Prediction
IEEE International Conference on Robotics and Automation (ICRA), 2024
Yupeng Zheng
Xiang Li
Pengfei Li
Yuhang Zheng
Bu Jin
Chengliang Zhong
Xiaoxiao Long
Hao Zhao
Qichao Zhang
184
43
0
13 Mar 2024
Embodied Understanding of Driving Scenarios
European Conference on Computer Vision (ECCV), 2024
Yunsong Zhou
Linyan Huang
Qingwen Bu
Jia Zeng
Tianyu Li
Hang Qiu
Hongzi Zhu
Minyi Guo
Yu Qiao
Hongyang Li
LM&Ro
195
51
0
07 Mar 2024
RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model
Jianhao Yuan
Shuyang Sun
Daniel Omeiza
Bo Zhao
Paul Newman
Lars Kunze
Matthew Gadd
LRM
308
76
0
16 Feb 2024
Using Left and Right Brains Together: Towards Vision and Language Planning
Jun Cen
Chenfei Wu
Xiao Liu
Sheng-Siang Yin
Yixuan Pei
Jinglong Yang
Qifeng Chen
Nan Duan
Jianguo Zhang
222
9
0
16 Feb 2024
Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives
IEEE Transactions on Intelligent Vehicles (TIV), 2024
Sheng Luo
Wei Chen
Wanxin Tian
Rui Liu
Luanxuan Hou
...
Ling Shao
Yi Yang
Bojun Gao
Qun Li
Guobin Wu
319
26
0
05 Feb 2024
Prospective Role of Foundation Models in Advancing Autonomous Vehicles
Jianhua Wu
B. Gao
Jincheng Gao
Jianhao Yu
Hongqing Chu
...
Xun Gong
Yi Chang
H. E. Tseng
Hong Chen
Jie Chen
285
16
0
08 Dec 2023
1
2
Next