Vector Quantized Feature Fields for Fast 3D Semantic Lifting

9 March 2025

Abstract

We generalize lifting to semantic lifting by incorporating per-view masks that indicate relevant pixels for lifting tasks. These masks are determined by querying corresponding multiscale pixel-aligned feature maps, which are derived from scene representations such as distilled feature fields and feature point clouds. However, storing per-view feature maps rendered from distilled feature fields is impractical, and feature point clouds are expensive to store and query. To enable lightweight on-demand retrieval of pixel-aligned relevance masks, we introduce the Vector-Quantized Feature Field. We demonstrate the effectiveness of the Vector-Quantized Feature Field on complex indoor and outdoor scenes. Semantic lifting, when paired with a Vector-Quantized Feature Field, can unlock a myriad of applications in scene representation and embodied intelligence. Specifically, we showcase how our method enables text-driven localized scene editing and significantly improves the efficiency of embodied question answering.

View on arXiv

@article{tang2025_2503.06469,
  title={ Vector Quantized Feature Fields for Fast 3D Semantic Lifting },
  author={ George Tang and Aditya Agarwal and Weiqiao Han and Trevor Darrell and Yutong Bai },
  journal={arXiv preprint arXiv:2503.06469},
  year={ 2025 }
}

Comments on this paper