PanoGS: Gaussian-based Panoptic Segmentation for 3D Open Vocabulary Scene Understanding

23 March 2025

Abstract

Recently, 3D Gaussian Splatting (3DGS) has shown encouraging performance for open vocabulary scene understanding tasks. However, previous methods cannot distinguish 3D instance-level information, which usually predicts a heatmap between the scene feature and text query. In this paper, we propose PanoGS, a novel and effective 3D panoptic open vocabulary scene understanding approach. Technically, to learn accurate 3D language features that can scale to large indoor scenarios, we adopt the pyramid tri-plane to model the latent continuous parametric feature space and use a 3D feature decoder to regress the multi-view fused 2D feature cloud. Besides, we propose language-guided graph cuts that synergistically leverage reconstructed geometry and learned language cues to group 3D Gaussian primitives into a set of super-primitives. To obtain 3D consistent instance, we perform graph clustering based segmentation with SAM-guided edge affinity computation between different super-primitives. Extensive experiments on widely used datasets show better or more competitive performance on 3D panoptic open vocabulary scene understanding. Project page: \href{this https URL}{this https URL}.

View on arXiv

@article{zhai2025_2503.18107,
  title={ PanoGS: Gaussian-based Panoptic Segmentation for 3D Open Vocabulary Scene Understanding },
  author={ Hongjia Zhai and Hai Li and Zhenzhe Li and Xiaokun Pan and Yijia He and Guofeng Zhang },
  journal={arXiv preprint arXiv:2503.18107},
  year={ 2025 }
}

Comments on this paper