v1v2 (latest)
A Simple and Generalist Approach for Panoptic Segmentation
- VLM
Main:8 Pages
9 Figures
Bibliography:3 Pages
5 Tables
Appendix:3 Pages
Abstract
Panoptic segmentation is an important computer vision task, where the current state-of-the-art solutions require specialized components to perform well. We propose a simple generalist framework based on a deep encoder - shallow decoder architecture with per-pixel prediction. Essentially fine-tuning a massively pretrained image model with minimal additional components. Naively this method does not yield good results. We show that this is due to imbalance during training and propose a novel method for reducing it - centroid regression in the space of spectral positional embeddings. Our method achieves panoptic quality (PQ) of 55.1 on the challenging MS-COCO dataset, state-of-the-art performance among generalist methods.
View on arXivComments on this paper
