STSeg-Complex Video Object Segmentation: The 1st Solution for 4th PVUW MOSE Challenge

Segmentation of video objects in complex scenarios is highly challenging, and the MOSE dataset has significantly contributed to the development of this field. This technical report details the STSeg solution proposed by the "imaplus"this http URLfinetuning SAM2 and the unsupervised model TMO on the MOSE dataset, the STSeg solution demonstrates remarkable advantages in handling complex object motions and long-video sequences. In the inference phase, an Adaptive Pseudo-labels Guided Model Refinement Pipeline is adopted to intelligently select appropriate models for processing each video. Through finetuning the models and employing the Adaptive Pseudo-labels Guided Model Refinement Pipeline in the inference phase, the STSeg solution achieved a J&F score of 87.26% on the test set of the 2025 4th PVUW Challenge MOSE Track, securing the 1st place and advancing the technology for video object segmentation in complex scenarios.
View on arXiv@article{song2025_2504.08306, title={ STSeg-Complex Video Object Segmentation: The 1st Solution for 4th PVUW MOSE Challenge }, author={ Kehuan Song and Xinglin Xie and Kexin Zhang and Licheng Jiao and Lingling Li and Shuyuan Yang }, journal={arXiv preprint arXiv:2504.08306}, year={ 2025 } }