v1v2 (latest)

Navigation-Guided Sparse Scene Representation for End-to-End Autonomous Driving

International Conference on Learning Representations (ICLR), 2024

26 September 2024

Peidong Li

Dixiao Cui

ArXiv (abs)PDF HTML

Main:10 Pages

13 Figures

Bibliography:3 Pages

8 Tables

Appendix:4 Pages

Abstract

End-to-End Autonomous Driving (E2EAD) methods typically rely on supervised perception tasks to extract explicit scene information (e.g., objects, maps). This reliance necessitates expensive annotations and constrains deployment and data scalability in real-time applications. In this paper, we introduce SSR, a novel framework that utilizes only 16 navigation-guided tokens as Sparse Scene Representation, efficiently extracting crucial scene information for E2EAD. Our method eliminates the need for human-designed supervised sub-tasks, allowing computational resources to concentrate on essential elements directly related to navigation intent. We further introduce a temporal enhancement module, aligning predicted future scenes with actual future scenes through self-supervision. SSR achieves a 27.2\% relative reduction in L2 error and a 51.6\% decrease in collision rate to UniAD in nuScenes, with a 10.9 $\times$ faster inference speed and 13 $\times$ faster training time. Moreover, SSR outperforms VAD-Base with a 48.6-point improvement on driving score in CARLA's Town05 Long benchmark. This framework represents a significant leap in real-time autonomous driving systems and paves the way for future scalable deployment. Code is available atthis https URL.

View on arXiv

Comments on this paper