SynthVerse: A Large-Scale Diverse Synthetic Dataset for Point Tracking

4 February 2026

Weiguang Zhao

Haoran Xu

Xingyu Miao

Qin Zhao

Rui Zhang

Kaizhu Huang

Ning Gao

Peizhou Cao

Mingze Sun

Mulin Yu

Tao Lu

Linning Xu

Junting Dong

Jiangmiao Pang

ArXiv (abs)PDF HTML Github

Main:9 Pages

6 Figures

Bibliography:3 Pages

6 Tables

Abstract

Point tracking aims to follow visual points through complex motion, occlusion, and viewpoint changes, and has advanced rapidly with modern foundation models. Yet progress toward general point tracking remains constrained by limited high-quality data, as existing datasets often provide insufficient diversity and imperfect trajectory annotations. To this end, we introduce SynthVerse, a large-scale, diverse synthetic dataset specifically designed for point tracking. SynthVerse includes several new domains and object types missing from existing synthetic datasets, such as animated-film-style content, embodied manipulation, scene navigation, and articulated objects. SynthVerse substantially expands dataset diversity by covering a broader range of object categories and providing high-quality dynamic motions and interactions, enabling more robust training and evaluation for general point tracking. In addition, we establish a highly diverse point tracking benchmark to systematically evaluate state-of-the-art methods under broader domain shifts. Extensive experiments and analyses demonstrate that training with SynthVerse yields consistent improvements in generalization and reveal limitations of existing trackers under diverse settings.

View on arXiv

Comments on this paper