440

Deflickering Vision-Based Occupancy Networks through Lightweight Spatio-Temporal Correlation

Main:10 Pages
7 Figures
Bibliography:3 Pages
9 Tables
Appendix:1 Pages
Abstract

Vision-based occupancy networks (VONs) provide an end-to-end solution for reconstructing 3D environments in autonomous driving. However, existing methods often suffer from temporal inconsistencies, manifesting as flickering effects that compromise visual experience and adversely affect decision-making. While recent approaches have incorporated historical data to mitigate the issue, they often incur high computational costs and may introduce noisy information that interferes with object detection. We propose OccLinker, a novel plugin framework designed to seamlessly integrate with existing VONs for boosting performance. Our method efficiently consolidates historical static and motion cues, learns sparse latent correlations with current features through a dual cross-attention mechanism, and produces correction occupancy components to refine the base network's predictions. We propose a new temporal consistency metric to quantitatively identify flickering effects. Extensive experiments on two benchmark datasets demonstrate that our method delivers superior performance with negligible computational overhead, while effectively eliminating flickering artifacts.

View on arXiv
Comments on this paper