B-repLer: Semantic B-rep Latent Editor using Large Language Models

Multimodal large language models (mLLMs), trained in a mixed modal setting as a universal model, have been shown to compete with or even outperform many specialized algorithms for imaging and graphics tasks. As demonstrated across many applications, mLLMs' ability to jointly process image and text data makes them suitable for zero-shot applications or efficient fine-tuning towards specialized tasks. However, they have had limited success in 3D analysis and editing tasks. This is due to both the lack of suitable (annotated) 3D data as well as the idiosyncrasies of 3D representations. In this paper, we investigate whether mLLMs can be adapted to support high-level editing of Boundary Representation (B-rep) CAD objects. B-reps remain the industry-standard for precisely encoding engineering objects, but are challenging as the representation is fragile (i.e. can easily lead to invalid CAD objects) and no publicly available data source exists with semantically-annotated B-reps or CAD construction history. We present B-repLer as a finetuned mLLM that can understand text prompts and make semantic edits on given B-Reps to produce valid outputs. We enable this via a novel multimodal architecture, specifically designed to handle B-rep models, and demonstrate how existing CAD tools, in conjunction with mLLMs, can be used to automatically generate the required reasoning dataset, without relying on external annotations. We extensively evaluate B-repLer and demonstrate several text-based B-rep edits of various complexity, which were not previously possible.
View on arXiv