36
0

Temporal Triplane Transformers as Occupancy World Models

Abstract

Recent years have seen significant advances in world models, which primarily focus on learning fine-grained correlations between an agent's motion trajectory and the resulting changes in its surrounding environment. However, existing methods often struggle to capture such fine-grained correlations and achieve real-time predictions. To address this, we propose a new 4D occupancy world model for autonomous driving, termed T3^3Former. T3^3Former begins by pre-training a compact triplane representation that efficiently compresses the 3D semantically occupied environment. Next, T3^3Former extracts multi-scale temporal motion features from the historical triplane and employs an autoregressive approach to iteratively predict the next triplane changes. Finally, T3^3Former combines the triplane changes with the previous ones to decode them into future occupancy results and ego-motion trajectories. Experimental results demonstrate the superiority of T3^3Former, achieving 1.44×\times faster inference speed (26 FPS), while improving the mean IoU to 36.09 and reducing the mean absolute planning error to 1.0 meters.

View on arXiv
@article{xu2025_2503.07338,
  title={ Temporal Triplane Transformers as Occupancy World Models },
  author={ Haoran Xu and Peixi Peng and Guang Tan and Yiqian Chang and Yisen Zhao and Yonghong Tian },
  journal={arXiv preprint arXiv:2503.07338},
  year={ 2025 }
}
Comments on this paper