ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.12781
31
1

VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control

17 July 2024
Sherwin Bahmani
Ivan Skorokhodov
Aliaksandr Siarohin
Willi Menapace
Guocheng Qian
Michael Vasilkovsky
Hsin-Ying Lee
Chaoyang Wang
Jiaxu Zou
Andrea Tagliasacchi
David B. Lindell
Sergey Tulyakov
    VGen
    DiffM
ArXivPDFHTML
Abstract

Modern text-to-video synthesis models demonstrate coherent, photorealistic generation of complex videos from a text description. However, most existing models lack fine-grained control over camera movement, which is critical for downstream applications related to content creation, visual effects, and 3D vision. Recently, new methods demonstrate the ability to generate videos with controllable camera poses these techniques leverage pre-trained U-Net-based diffusion models that explicitly disentangle spatial and temporal generation. Still, no existing approach enables camera control for new, transformer-based video diffusion models that process spatial and temporal information jointly. Here, we propose to tame video transformers for 3D camera control using a ControlNet-like conditioning mechanism that incorporates spatiotemporal camera embeddings based on Plücker coordinates. The approach demonstrates state-of-the-art performance for controllable video generation after fine-tuning on the RealEstate10K dataset. To the best of our knowledge, our work is the first to enable camera control for transformer-based video diffusion models.

View on arXiv
@article{bahmani2025_2407.12781,
  title={ VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control },
  author={ Sherwin Bahmani and Ivan Skorokhodov and Aliaksandr Siarohin and Willi Menapace and Guocheng Qian and Michael Vasilkovsky and Hsin-Ying Lee and Chaoyang Wang and Jiaxu Zou and Andrea Tagliasacchi and David B. Lindell and Sergey Tulyakov },
  journal={arXiv preprint arXiv:2407.12781},
  year={ 2025 }
}
Comments on this paper