HumanGif: Single-View Human Diffusion with Generative Prior

24 February 2025

Abstract

Previous 3D human creation methods have made significant progress in synthesizing view-consistent and temporally aligned results from sparse-view images or monocular videos. However, it remains challenging to produce perpetually realistic, view-consistent, and temporally coherent human avatars from a single image, as limited information is available in the single-view input setting. Motivated by the success of 2D character animation, we propose HumanGif, a single-view human diffusion model with generative prior. Specifically, we formulate the single-view-based 3D human novel view and pose synthesis as a single-view-conditioned human diffusion process, utilizing generative priors from foundational diffusion models to complement the missing information. To ensure fine-grained and consistent novel view and pose synthesis, we introduce a Human NeRF module in HumanGif to learn spatially aligned features from the input image, implicitly capturing the relative camera and human pose transformation. Furthermore, we introduce an image-level loss during optimization to bridge the gap between latent and image spaces in diffusion models. Extensive experiments on RenderPeople and DNA-Rendering datasets demonstrate that HumanGif achieves the best perceptual performance, with better generalizability for novel view and pose synthesis.

View on arXiv

@article{hu2025_2502.12080,
  title={ HumanGif: Single-View Human Diffusion with Generative Prior },
  author={ Shoukang Hu and Takuya Narihira and Kazumi Fukuda and Ryosuke Sawata and Takashi Shibuya and Yuki Mitsufuji },
  journal={arXiv preprint arXiv:2502.12080},
  year={ 2025 }
}

Comments on this paper