ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.12945
61
90

Lumiere: A Space-Time Diffusion Model for Video Generation

23 January 2024
Omer Bar-Tal
Hila Chefer
Omer Tov
Charles Herrmann
Roni Paiss
Shiran Zada
Ariel Ephrat
Junhwa Hur
Guanghui Liu
Amit Raj
Yuanzhen Li
Michael Rubinstein
T. Michaeli
Oliver Wang
Deqing Sun
Tali Dekel
Inbar Mosseri
    VGen
ArXivPDFHTML
Abstract

We introduce Lumiere -- a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion -- a pivotal challenge in video synthesis. To this end, we introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once, through a single pass in the model. This is in contrast to existing video models which synthesize distant keyframes followed by temporal super-resolution -- an approach that inherently makes global temporal consistency difficult to achieve. By deploying both spatial and (importantly) temporal down- and up-sampling and leveraging a pre-trained text-to-image diffusion model, our model learns to directly generate a full-frame-rate, low-resolution video by processing it in multiple space-time scales. We demonstrate state-of-the-art text-to-video generation results, and show that our design easily facilitates a wide range of content creation tasks and video editing applications, including image-to-video, video inpainting, and stylized generation.

View on arXiv
Comments on this paper