3D scene understanding has been transformed by open-vocabulary language models that enable interaction via natural language. However, the evaluation of these representations is limited to closed-set semantics that do not capture the richness of language. This work presents OpenLex3D, a dedicated benchmark to evaluate 3D open-vocabulary scene representations. OpenLex3D provides entirely new label annotations for 23 scenes from Replica, ScanNet++, and HM3D, which capture real-world linguistic variability by introducing synonymical object categories and additional nuanced descriptions. By introducing an open-set 3D semantic segmentation task and an object retrieval task, we provide insights on feature precision, segmentation, and downstream capabilities. We evaluate various existing 3D open-vocabulary methods on OpenLex3D, showcasing failure cases, and avenues for improvement. The benchmark is publicly available at:this https URL.
View on arXiv@article{kassab2025_2503.19764, title={ OpenLex3D: A New Evaluation Benchmark for Open-Vocabulary 3D Scene Representations }, author={ Christina Kassab and Sacha Morin and Martin Büchner and Matías Mattamala and Kumaraditya Gupta and Abhinav Valada and Liam Paull and Maurice Fallon }, journal={arXiv preprint arXiv:2503.19764}, year={ 2025 } }