93
2

MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation

Abstract

This paper presents a method that allows users to design cinematic video shots in the context of image-to-video generation. Shot design, a critical aspect of filmmaking, involves meticulously planning both camera movements and object motions in a scene. However, enabling intuitive shot design in modern image-to-video generation systems presents two main challenges: first, effectively capturing user intentions on the motion design, where both camera movements and scene-space object motions must be specified jointly; and second, representing motion information that can be effectively utilized by a video diffusion model to synthesize the image animations. To address these challenges, we introduce MotionCanvas, a method that integrates user-driven controls into image-to-video (I2V) generation models, allowing users to control both object and camera motions in a scene-aware manner. By connecting insights from classical computer graphics and contemporary video generation techniques, we demonstrate the ability to achieve 3D-aware motion control in I2V synthesis without requiring costly 3D-related training data. MotionCanvas enables users to intuitively depict scene-space motion intentions, and translates them into spatiotemporal motion-conditioning signals for video diffusion models. We demonstrate the effectiveness of our method on a wide range of real-world image content and shot-design scenarios, highlighting its potential to enhance the creative workflows in digital content creation and adapt to various image and video editing applications.

View on arXiv
@article{xing2025_2502.04299,
  title={ MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation },
  author={ Jinbo Xing and Long Mai and Cusuh Ham and Jiahui Huang and Aniruddha Mahapatra and Chi-Wing Fu and Tien-Tsin Wong and Feng Liu },
  journal={arXiv preprint arXiv:2502.04299},
  year={ 2025 }
}
Comments on this paper