LAVA: Language Driven Scalable and Versatile Traffic Video Analytics

26 July 2025

Yanrui Yu

ArXiv (abs)PDF HTML Github

Main:8 Pages

6 Figures

Bibliography:1 Pages

5 Tables

Abstract

In modern urban environments, camera networks generate massive amounts of operational footage -- reaching petabytes each day -- making scalable video analytics essential for efficient processing. Many existing approaches adopt an SQL-based paradigm for querying such large-scale video databases; however, this constrains queries to rigid patterns with predefined semantic categories, significantly limiting analytical flexibility. In this work, we explore a language-driven video analytics paradigm aimed at enabling flexible and efficient querying of high-volume video data driven by natural language. Particularly, we build \textsc{Lava}, a system that accepts natural language queries and retrieves traffic targets across multiple levels of granularity and arbitrary categories. \textsc{Lava} comprises three main components: 1) a multi-armed bandit-based efficient sampling method for video segment-level localization;

View on arXiv

Comments on this paper