PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth

Recent advancements in autonomous driving (AD) systems have highlighted the potential of world models in achieving robust and generalizable performance across both ordinary and challenging driving conditions. However, a key challenge remains: precise and flexible camera pose control, which is crucial for accurate viewpoint transformation and realistic simulation of scene dynamics. In this paper, we introduce PosePilot, a lightweight yet powerful framework that significantly enhances camera pose controllability in generative world models. Drawing inspiration from self-supervised depth estimation, PosePilot leverages structure-from-motion principles to establish a tight coupling between camera pose and video generation. Specifically, we incorporate self-supervised depth and pose readouts, allowing the model to infer depth and relative camera motion directly from video sequences. These outputs drive pose-aware frame warping, guided by a photometric warping loss that enforces geometric consistency across synthesized frames. To further refine camera pose estimation, we introduce a reverse warping step and a pose regression loss, improving viewpoint precision and adaptability. Extensive experiments on autonomous driving and general-domain video datasets demonstrate that PosePilot significantly enhances structural understanding and motion reasoning in both diffusion-based and auto-regressive world models. By steering camera pose with self-supervised depth, PosePilot sets a new benchmark for pose controllability, enabling physically consistent, reliable viewpoint synthesis in generative world models.
View on arXiv@article{jin2025_2505.01729, title={ PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth }, author={ Bu Jin and Weize Li and Baihan Yang and Zhenxin Zhu and Junpeng Jiang and Huan-ang Gao and Haiyang Sun and Kun Zhan and Hengtong Hu and Xueyang Zhang and Peng Jia and Hao Zhao }, journal={arXiv preprint arXiv:2505.01729}, year={ 2025 } }