EX-4D: EXtreme Viewpoint 4D Video Synthesis via Depth Watertight Mesh

5 June 2025

Main:9 Pages

18 Figures

Bibliography:3 Pages

3 Tables

Appendix:12 Pages

Abstract

Generating high-quality camera-controllable videos from monocular input is a challenging task, particularly under extreme viewpoint. Existing methods often struggle with geometric inconsistencies and occlusion artifacts in boundaries, leading to degraded visual quality. In this paper, we introduce EX-4D, a novel framework that addresses these challenges through a Depth Watertight Mesh representation. The representation serves as a robust geometric prior by explicitly modeling both visible and occluded regions, ensuring geometric consistency in extreme camera pose. To overcome the lack of paired multi-view datasets, we propose a simulated masking strategy that generates effective training data only from monocular videos. Additionally, a lightweight LoRA-based video diffusion adapter is employed to synthesize high-quality, physically consistent, and temporally coherent videos. Extensive experiments demonstrate that EX-4D outperforms state-of-the-art methods in terms of physical consistency and extreme-view quality, enabling practical 4D video generation.

View on arXiv

@article{hu2025_2506.05554,
  title={ EX-4D: EXtreme Viewpoint 4D Video Synthesis via Depth Watertight Mesh },
  author={ Tao Hu and Haoyang Peng and Xiao Liu and Yuewen Ma },
  journal={arXiv preprint arXiv:2506.05554},
  year={ 2025 }
}

Comments on this paper