ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.04324
23
1

SF-V: Single Forward Video Generation Model

6 June 2024
Zhixing Zhang
Yanyu Li
Yushu Wu
Yanwu Xu
Anil Kag
Ivan Skorokhodov
Willi Menapace
Aliaksandr Siarohin
Junli Cao
Dimitris N. Metaxas
Sergey Tulyakov
Jian Ren
    DiffM
    VGen
ArXivPDFHTML
Abstract

Diffusion-based video generation models have demonstrated remarkable success in obtaining high-fidelity videos through the iterative denoising process. However, these models require multiple denoising steps during sampling, resulting in high computational costs. In this work, we propose a novel approach to obtain single-step video generation models by leveraging adversarial training to fine-tune pre-trained video diffusion models. We show that, through the adversarial training, the multi-steps video diffusion model, i.e., Stable Video Diffusion (SVD), can be trained to perform single forward pass to synthesize high-quality videos, capturing both temporal and spatial dependencies in the video data. Extensive experiments demonstrate that our method achieves competitive generation quality of synthesized videos with significantly reduced computational overhead for the denoising process (i.e., around 23×23\times23× speedup compared with SVD and 6×6\times6× speedup compared with existing works, with even better generation quality), paving the way for real-time video synthesis and editing. More visualization results are made publicly available at https://snap-research.github.io/SF-V.

View on arXiv
Comments on this paper