v1v2 (latest)

Viewpoint Equivariance for Multi-View 3D Object Detection

Computer Vision and Pattern Recognition (CVPR), 2023

25 March 2023

Di Chen

Jiajian Li

Vitor Campagnolo Guizilini

Rares Andrei Ambrus

Adrien Gaidon

3DPC

ArXiv (abs)PDF HTML Github (42★)

Abstract

3D object detection from visual sensors is a cornerstone capability of robotic systems. State-of-the-art methods focus on reasoning and decoding object bounding boxes from multi-view camera input. In this work we gain intuition from the integral role of multi-view consistency in 3D scene understanding and geometric learning. To this end, we introduce VEDet, a novel 3D object detection framework that exploits 3D multi-view geometry to improve localization through viewpoint awareness and equivariance. VEDet leverages a query-based transformer architecture and encodes the 3D scene by augmenting image features with positional encodings from their 3D perspective geometry. We design view-conditioned queries at the output level, which enables the generation of multiple virtual frames during training to learn viewpoint equivariance by enforcing multi-view consistency. The multi-view geometry injected at the input level as positional encodings and regularized at the loss level provides rich geometric cues for 3D object detection, leading to state-of-the-art performance on the nuScenes benchmark. The code and model are made available at https://github.com/TRI-ML/VEDet.

View on arXiv

Comments on this paper