Normal Transformer: Extracting Surface Geometry from LiDAR Points Enhanced by Visual Semantics

High-quality surface normal can help improve geometry estimation in problems faced by autonomous vehicles, such as collision avoidance and occlusion inference. While a considerable volume of literature focuses on densely scanned indoor scenarios, normal estimation during autonomous driving remains an intricate problem due to the sparse, non-uniform, and noisy nature of real-world LiDAR scans. In this paper, we introduce a multi-modal technique that leverages 3D point clouds and 2D colour images obtained from LiDAR and camera sensors for surface normal estimation. We present the Hybrid Geometric Transformer (HGT), a novel transformer-based neural network architecture that proficiently fuses visual semantic and 3D geometric information. Furthermore, we developed an effective learning strategy for the multi-modal data. Experimental results demonstrate the superior effectiveness of our information fusion approach compared to existing methods. It has also been verified that the proposed model can learn from a simulated 3D environment that mimics a traffic scene. The learned geometric knowledge is transferable and can be applied to real-world 3D scenes in the KITTI dataset. Further tasks built upon the estimated normal vectors in the KITTI dataset show that the proposed estimator has an advantage over existing methods.
View on arXiv@article{lin2025_2211.10580, title={ Normal Transformer: Extracting Surface Geometry from LiDAR Points Enhanced by Visual Semantics }, author={ Ancheng Lin and Jun Li and Yusheng Xiang and Wei Bian and Mukesh Prasad }, journal={arXiv preprint arXiv:2211.10580}, year={ 2025 } }