ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.05332
  4. Cited By
On the Road with GPT-4V(ision): Early Explorations of Visual-Language
  Model on Autonomous Driving

On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving

9 November 2023
Licheng Wen
Xuemeng Yang
Daocheng Fu
Xiaofeng Wang
Pinlong Cai
Xin Li
Tao Ma
Yingxuan Li
Linran Xu
Dengke Shang
Zheng Zhu
Shaoyan Sun
Yeqi Bai
Xinyu Cai
Min Dou
Shuanglu Hu
Botian Shi
Yu Qiao
    VLM
ArXivPDFHTML

Papers citing "On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving"

50 / 56 papers shown
Title
Exploring the Roles of Large Language Models in Reshaping Transportation Systems: A Survey, Framework, and Roadmap
Exploring the Roles of Large Language Models in Reshaping Transportation Systems: A Survey, Framework, and Roadmap
Tong Nie
Jian-jun Sun
Wei Ma
58
1
0
27 Mar 2025
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
W. Zhang
Mengna Wang
Gangao Liu
Xu Huixin
Yiwei Jiang
...
Hang Zhang
Xin Li
Weiming Lu
Peng Li
Y. Zhuang
LM&Ro
LRM
65
3
0
27 Mar 2025
NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models
NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models
Sung-Yeon Park
Can Cui
Yunsheng Ma
Ahmadreza Moradipari
Rohit Gupta
Kyungtae Han
Ziran Wang
34
0
0
17 Mar 2025
Towards Statistical Factuality Guarantee for Large Vision-Language Models
Towards Statistical Factuality Guarantee for Large Vision-Language Models
Z. Li
Chao Yan
Nicholas J. Jackson
Wendi Cui
B. Li
Jiaxin Zhang
Bradley Malin
69
0
0
27 Feb 2025
INSIGHT: Enhancing Autonomous Driving Safety through Vision-Language Models on Context-Aware Hazard Detection and Edge Case Evaluation
INSIGHT: Enhancing Autonomous Driving Safety through Vision-Language Models on Context-Aware Hazard Detection and Edge Case Evaluation
Dianwei Chen
Zifan Zhang
Yuchen Liu
X. Yang
VLM
52
2
0
01 Feb 2025
When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysis
When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysis
Ruixuan Zhang
Beichen Wang
Juexiao Zhang
Zilin Bian
Chen Feng
K. Ozbay
39
2
0
17 Jan 2025
Ethical-Lens: Curbing Malicious Usages of Open-Source Text-to-Image Models
Ethical-Lens: Curbing Malicious Usages of Open-Source Text-to-Image Models
Yuzhu Cai
Sheng Yin
Yuxi Wei
Chenxin Xu
Weibo Mao
Felix Juefei Xu
Siheng Chen
Yanfeng Wang
EGVM
79
2
0
03 Jan 2025
LaVida Drive: Vision-Text Interaction VLM for Autonomous Driving with Token Selection, Recovery and Enhancement
LaVida Drive: Vision-Text Interaction VLM for Autonomous Driving with Token Selection, Recovery and Enhancement
Siwen Jiao
Yangyi Fang
Baoyun Peng
Wangqun Chen
Bharadwaj Veeravalli
76
4
0
20 Nov 2024
Automatically Generating Visual Hallucination Test Cases for Multimodal
  Large Language Models
Automatically Generating Visual Hallucination Test Cases for Multimodal Large Language Models
Zhongye Liu
Hongbin Liu
Yuepeng Hu
Zedian Shao
Neil Zhenqiang Gong
VLM
MLLM
21
0
0
15 Oct 2024
Training-Free Open-Ended Object Detection and Segmentation via Attention
  as Prompts
Training-Free Open-Ended Object Detection and Segmentation via Attention as Prompts
Zhiwei Lin
Yongtao Wang
Zhi Tang
ObjD
VLM
28
2
0
08 Oct 2024
Uncertainty-Guided Enhancement on Driving Perception System via
  Foundation Models
Uncertainty-Guided Enhancement on Driving Perception System via Foundation Models
Yunhao Yang
Yuxin Hu
Mao Ye
Zaiwei Zhang
Zhichao Lu
Yi Xu
Ufuk Topcu
Ben Snyder
26
2
0
02 Oct 2024
DRIVE: Dependable Robust Interpretable Visionary Ensemble Framework in
  Autonomous Driving
DRIVE: Dependable Robust Interpretable Visionary Ensemble Framework in Autonomous Driving
Songning Lai
Tianlang Xue
Hongru Xiao
Lijie Hu
Jiemin Wu
Ninghui Feng
Runwei Guan
Haicheng Liao
Zhenning Li
Yutao Yue
26
4
0
16 Sep 2024
Making Large Language Models Better Planners with Reasoning-Decision
  Alignment
Making Large Language Models Better Planners with Reasoning-Decision Alignment
Zhijian Huang
Tao Tang
Shaoxiang Chen
Sihao Lin
Zequn Jie
Lin Ma
Guangrun Wang
Xiaodan Liang
54
9
0
25 Aug 2024
AppAgent v2: Advanced Agent for Flexible Mobile Interactions
AppAgent v2: Advanced Agent for Flexible Mobile Interactions
Yanda Li
Chi Zhang
Wanqi Yang
Bin-Bin Fu
Pei Cheng
Xin Chen
Ling Chen
Yunchao Wei
LLMAG
LM&Ro
31
9
0
05 Aug 2024
PG-Attack: A Precision-Guided Adversarial Attack Framework Against
  Vision Foundation Models for Autonomous Driving
PG-Attack: A Precision-Guided Adversarial Attack Framework Against Vision Foundation Models for Autonomous Driving
Jiyuan Fu
Zhaoyu Chen
Kaixun Jiang
Haijing Guo
Shuyong Gao
Wenqiang Zhang
AAML
35
1
0
18 Jul 2024
Malicious Path Manipulations via Exploitation of Representation
  Vulnerabilities of Vision-Language Navigation Systems
Malicious Path Manipulations via Exploitation of Representation Vulnerabilities of Vision-Language Navigation Systems
Chashi Mahiul Islam
Shaeke Salman
M. Shams
Xiuwen Liu
Piyush Kumar
AAML
30
4
0
10 Jul 2024
MFE-ETP: A Comprehensive Evaluation Benchmark for Multi-modal Foundation
  Models on Embodied Task Planning
MFE-ETP: A Comprehensive Evaluation Benchmark for Multi-modal Foundation Models on Embodied Task Planning
Min Zhang
Jianye Hao
Xian Fu
Peilong Han
Hao Zhang
Lei Shi
Hongyao Tang
Yan Zheng
53
1
0
06 Jul 2024
HEMM: Holistic Evaluation of Multimodal Foundation Models
HEMM: Holistic Evaluation of Multimodal Foundation Models
Paul Pu Liang
Akshay Goindani
Talha Chafekar
Leena Mathur
Haofei Yu
Ruslan Salakhutdinov
Louis-Philippe Morency
36
10
0
03 Jul 2024
Hard Cases Detection in Motion Prediction by Vision-Language Foundation
  Models
Hard Cases Detection in Motion Prediction by Vision-Language Foundation Models
Yi Yang
Qingwen Zhang
Kei Ikemura
Nazre Batool
John Folkesson
VLM
33
1
0
31 May 2024
VisionGraph: Leveraging Large Multimodal Models for Graph Theory
  Problems in Visual Context
VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context
Yunxin Li
Baotian Hu
Haoyuan Shi
Wei Wang
Longyue Wang
Min-Ling Zhang
LRM
27
12
0
08 May 2024
CityLLaVA: Efficient Fine-Tuning for VLMs in City Scenario
CityLLaVA: Efficient Fine-Tuning for VLMs in City Scenario
Zhizhao Duan
Hao Cheng
Duo Xu
Xi Wu
Xiangxie Zhang
Xi Ye
Zhen Xie
24
6
0
06 May 2024
Integration of Mixture of Experts and Multimodal Generative AI in
  Internet of Vehicles: A Survey
Integration of Mixture of Experts and Multimodal Generative AI in Internet of Vehicles: A Survey
Minrui Xu
Dusit Niyato
Jiawen Kang
Zehui Xiong
Abbas Jamalipour
Yuguang Fang
Dong In Kim
Xuemin
X. Shen
23
5
0
25 Apr 2024
Physical Backdoor Attack can Jeopardize Driving with
  Vision-Large-Language Models
Physical Backdoor Attack can Jeopardize Driving with Vision-Large-Language Models
Zhenyang Ni
Rui Ye
Yuxian Wei
Zhen Xiang
Yanfeng Wang
Siheng Chen
AAML
32
9
0
19 Apr 2024
Automated Evaluation of Large Vision-Language Models on Self-driving
  Corner Cases
Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases
Kai Chen
Yanze Li
Wenhua Zhang
Yanxin Liu
Pengxiang Li
...
Xinhai Zhao
Zhenguo Li
Dit-Yan Yeung
Huchuan Lu
Xu Jia
ELM
MLLM
48
28
0
16 Apr 2024
Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art
Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art
Neeloy Chakraborty
Melkior Ornik
Katherine Driggs-Campbell
LRM
57
9
0
25 Mar 2024
Explore until Confident: Efficient Exploration for Embodied Question
  Answering
Explore until Confident: Efficient Exploration for Embodied Question Answering
Allen Z. Ren
Jaden Clark
Anushri Dixit
Masha Itkina
Anirudha Majumdar
Dorsa Sadigh
40
28
0
23 Mar 2024
GPT as Psychologist? Preliminary Evaluations for GPT-4V on Visual
  Affective Computing
GPT as Psychologist? Preliminary Evaluations for GPT-4V on Visual Affective Computing
Hao Lu
Xuesong Niu
Jiyao Wang
Yin Wang
Qingyong Hu
...
Dengbo He
Shuiguang Deng
Hao Chen
Ying Chen
Shiguang Shan
MLLM
41
10
0
09 Mar 2024
A Survey on Human-AI Teaming with Large Pre-Trained Models
A Survey on Human-AI Teaming with Large Pre-Trained Models
Vanshika Vats
Marzia Binta Nizam
Minghao Liu
Ziyuan Wang
Richard Ho
...
Celeste Shen
Rachel Shen
Nafisa Hussain
Kesav Ravichandran
James Davis
LM&MA
36
8
0
07 Mar 2024
Large Multimodal Agents: A Survey
Large Multimodal Agents: A Survey
Junlin Xie
Zhihong Chen
Ruifei Zhang
Xiang Wan
Guanbin Li
LM&Ro
LLMAG
37
38
0
23 Feb 2024
Scaffolding Coordinates to Promote Vision-Language Coordination in Large
  Multi-Modal Models
Scaffolding Coordinates to Promote Vision-Language Coordination in Large Multi-Modal Models
Xuanyu Lei
Zonghan Yang
Xinrui Chen
Peng Li
Yang Liu
MLLM
LRM
32
30
0
19 Feb 2024
Rec-GPT4V: Multimodal Recommendation with Large Vision-Language Models
Rec-GPT4V: Multimodal Recommendation with Large Vision-Language Models
Yuqing Liu
Yu Wang
Lichao Sun
Philip S. Yu
12
6
0
13 Feb 2024
PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
Michael Dorkenwald
Nimrod Barazani
Cees G. M. Snoek
Yuki M. Asano
VLM
MLLM
25
12
0
13 Feb 2024
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
Soroush Nasiriany
Fei Xia
Wenhao Yu
Ted Xiao
Jacky Liang
...
Karol Hausman
N. Heess
Chelsea Finn
Sergey Levine
Brian Ichter
LM&Ro
LRM
25
90
0
12 Feb 2024
Diffusion-ES: Gradient-free Planning with Diffusion for Autonomous
  Driving and Zero-Shot Instruction Following
Diffusion-ES: Gradient-free Planning with Diffusion for Autonomous Driving and Zero-Shot Instruction Following
Brian Yang
Huangyuan Su
N. Gkanatsios
Tsung-Wei Ke
Ayush Jain
Jeff Schneider
Katerina Fragkiadaki
DiffM
37
20
0
09 Feb 2024
Delving into Multi-modal Multi-task Foundation Models for Road Scene
  Understanding: From Learning Paradigm Perspectives
Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives
Sheng Luo
Wei-Neng Chen
Wanxin Tian
Rui Liu
Luanxuan Hou
...
Ling Shao
Yi Yang
Bojun Gao
Qun Li
Guobin Wu
47
13
0
05 Feb 2024
GPT-4V as Traffic Assistant: An In-depth Look at Vision Language Model
  on Complex Traffic Events
GPT-4V as Traffic Assistant: An In-depth Look at Vision Language Model on Complex Traffic Events
Xingcheng Zhou
Alois C. Knoll
17
8
0
03 Feb 2024
LimSim++: A Closed-Loop Platform for Deploying Multimodal LLMs in
  Autonomous Driving
LimSim++: A Closed-Loop Platform for Deploying Multimodal LLMs in Autonomous Driving
Daocheng Fu
Wenjie Lei
Licheng Wen
Pinlong Cai
Song Mao
Min Dou
Botian Shi
Yu Qiao
38
27
0
02 Feb 2024
A Survey for Foundation Models in Autonomous Driving
A Survey for Foundation Models in Autonomous Driving
Haoxiang Gao
Yaqian Li
Kaiwen Long
Ming Yang
Yiqing Shen
VLM
LRM
53
22
0
02 Feb 2024
Data-Centric Evolution in Autonomous Driving: A Comprehensive Survey of
  Big Data System, Data Mining, and Closed-Loop Technologies
Data-Centric Evolution in Autonomous Driving: A Comprehensive Survey of Big Data System, Data Mining, and Closed-Loop Technologies
Lincan Li
Wei Shao
Wei Dong
Yijun Tian
Qiming Zhang
Kaixiang Yang
Wenjie Zhang
18
8
0
23 Jan 2024
GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot
  Egocentric Action Recognition
GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot Egocentric Action Recognition
Guangzhao Dai
Xiangbo Shu
Wenhao Wu
Rui Yan
Jiachao Zhang
VLM
14
5
0
18 Jan 2024
Forging Vision Foundation Models for Autonomous Driving: Challenges,
  Methodologies, and Opportunities
Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities
Xu Yan
Haiming Zhang
Yingjie Cai
Jingming Guo
Weichao Qiu
...
Lihui Jiang
Wei Zhang
Hongbo Zhang
Dengxin Dai
Bingbing Liu
51
17
0
16 Jan 2024
Masked Modeling for Self-supervised Representation Learning on Vision
  and Beyond
Masked Modeling for Self-supervised Representation Learning on Vision and Beyond
Siyuan Li
Luyuan Zhang
Zedong Wang
Di Wu
Lirong Wu
...
Jun-Xiong Xia
Cheng Tan
Yang Liu
Baigui Sun
Stan Z. Li
SSL
29
13
0
31 Dec 2023
LingoQA: Video Question Answering for Autonomous Driving
LingoQA: Video Question Answering for Autonomous Driving
Ana-Maria Marcu
Long Chen
Jan Hünermann
Alice Karnsund
Benoît Hanotte
...
Vijay Badrinarayanan
Alex Kendall
Jamie Shotton
Elahe Arani
Oleg Sinavski
21
31
0
21 Dec 2023
An Evaluation of GPT-4V and Gemini in Online VQA
An Evaluation of GPT-4V and Gemini in Online VQA
Mengchen Liu
Chongyan Chen
Danna Gurari
MLLM
53
7
0
17 Dec 2023
GlitchBench: Can large multimodal models detect video game glitches?
GlitchBench: Can large multimodal models detect video game glitches?
Mohammad Reza Taesiri
Tianjun Feng
Anh Nguyen
C. Bezemer
MLLM
VLM
LRM
30
9
0
08 Dec 2023
LaMPilot: An Open Benchmark Dataset for Autonomous Driving with Language
  Model Programs
LaMPilot: An Open Benchmark Dataset for Autonomous Driving with Language Model Programs
Yunsheng Ma
Can Cui
Xu Cao
Wenqian Ye
Peiran Liu
...
Rohit Gupta
Kyungtae Han
Aniket Bera
James M. Rehg
Ziran Wang
21
42
0
07 Dec 2023
Towards Knowledge-driven Autonomous Driving
Towards Knowledge-driven Autonomous Driving
Xin Li
Yeqi Bai
Pinlong Cai
Licheng Wen
Daocheng Fu
...
Yikang Li
Botian Shi
Yong-Jin Liu
Liang He
Yu Qiao
32
26
0
07 Dec 2023
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from
  Fine-grained Correctional Human Feedback
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
M. Steyvers
Yuan Yao
Haoye Zhang
Taiwen He
Yifeng Han
...
Xinyue Hu
Zhiyuan Liu
Hai-Tao Zheng
Maosong Sun
Tat-Seng Chua
MLLM
VLM
130
177
0
01 Dec 2023
GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?
GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?
Wenhao Wu
Huanjin Yao
Mengxi Zhang
Yuxin Song
Wanli Ouyang
Jingdong Wang
VLM
22
29
0
27 Nov 2023
Charting New Territories: Exploring the Geographic and Geospatial
  Capabilities of Multimodal LLMs
Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs
Jonathan Roberts
Timo Lüddecke
Rehan Sheikh
Kai Han
Samuel Albanie
MLLM
11
26
0
24 Nov 2023
12
Next