37

Generative 6D Pose Estimation via Conditional Flow Matching

Amir Hamza
Davide Boscaini
Weihang Li
Benjamin Busam
Fabio Poiesi
Main:5 Pages
4 Figures
Bibliography:1 Pages
1 Tables
Abstract

Existing methods for instance-level 6D pose estimation typically rely on neural networks that either directly regress the pose in SE(3)\mathrm{SE}(3) or estimate it indirectly via local feature matching. The former struggle with object symmetries, while the latter fail in the absence of distinctive local features. To overcome these limitations, we propose a novel formulation of 6D pose estimation as a conditional flow matching problem in R3\mathbb{R}^3. We introduce Flose, a generative method that infers object poses via a denoising process conditioned on local features. While prior approaches based on conditional flow matching perform denoising solely based on geometric guidance, Flose integrates appearance-based semantic features to mitigate ambiguities caused by object symmetries. We further incorporate RANSAC-based registration to handle outliers. We validate Flose on five datasets from the established BOP benchmark. Flose outperforms prior methods with an average improvement of +4.5 Average Recall. Project Website :this https URL

View on arXiv
Comments on this paper