14
0

WorldPrompter: Traversable Text-to-Scene Generation

Abstract

Scene-level 3D generation is a challenging research topic, with most existing methods generating only partial scenes and offering limited navigational freedom. We introduce WorldPrompter, a novel generative pipeline for synthesizing traversable 3D scenes from text prompts. We leverage panoramic videos as an intermediate representation to model the 360° details of a scene. WorldPrompter incorporates a conditional 360° panoramic video generator, capable of producing a 128-frame video that simulates a person walking through and capturing a virtual environment. The resulting video is then reconstructed as Gaussian splats by a fast feedforward 3D reconstructor, enabling a true walkable experience within the 3D scene. Experiments demonstrate that our panoramic video generation model achieves convincing view consistency across frames, enabling high-quality panoramic Gaussian splat reconstruction and facilitating traversal over an area of the scene. Qualitative and quantitative results also show it outperforms the state-of-the-art 360° video generators and 3D scene generation models.

View on arXiv
@article{zhang2025_2504.02045,
  title={ WorldPrompter: Traversable Text-to-Scene Generation },
  author={ Zhaoyang Zhang and Yannick Hold-Geoffroy and Miloš Hašan and Chen Ziwen and Fujun Luan and Julie Dorsey and Yiwei Hu },
  journal={arXiv preprint arXiv:2504.02045},
  year={ 2025 }
}
Comments on this paper