Large Language Models for Crash Detection in Video: A Survey of Methods, Datasets, and Challenges

2 July 2025

Sanjeda Akter

Ibne Farabi Shihab

Anuj Sharma

ArXiv (abs)PDF HTML

Main:19 Pages

5 Figures

Bibliography:2 Pages

2 Tables

Abstract

Crash detection from video feeds is a critical problem in intelligent transportation systems. Recent developments in large language models (LLMs) and vision-language models (VLMs) have transformed how we process, reason about, and summarize multimodal information. This paper surveys recent methods leveraging LLMs for crash detection from video data. We present a structured taxonomy of fusion strategies, summarize key datasets, analyze model architectures, compare performance benchmarks, and discuss ongoing challenges and opportunities. Our review provides a foundation for future research in this fast-growing intersection of video understanding and foundation models.

View on arXiv

@article{akter2025_2507.02074,
  title={ Large Language Models for Crash Detection in Video: A Survey of Methods, Datasets, and Challenges },
  author={ Sanjeda Akter and Ibne Farabi Shihab and Anuj Sharma },
  journal={arXiv preprint arXiv:2507.02074},
  year={ 2025 }
}

Comments on this paper