ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.15757
57
7

Looking Backward: Streaming Video-to-Video Translation with Feature Banks

24 May 2024
Feng Liang
Akio Kodaira
Chenfeng Xu
M. Tomizuka
Kurt Keutzer
Diana Marculescu
    DiffM
    VGen
ArXivPDFHTML
Abstract

This paper introduces StreamV2V, a diffusion model that achieves real-time streaming video-to-video (V2V) translation with user prompts. Unlike prior V2V methods using batches to process limited frames, we opt to process frames in a streaming fashion, to support unlimited frames. At the heart of StreamV2V lies a backward-looking principle that relates the present to the past. This is realized by maintaining a feature bank, which archives information from past frames. For incoming frames, StreamV2V extends self-attention to include banked keys and values and directly fuses similar past features into the output. The feature bank is continually updated by merging stored and new features, making it compact but informative. StreamV2V stands out for its adaptability and efficiency, seamlessly integrating with image diffusion models without fine-tuning. It can run 20 FPS on one A100 GPU, being 15x, 46x, 108x, and 158x faster than FlowVid, CoDeF, Rerender, and TokenFlow, respectively. Quantitative metrics and user studies confirm StreamV2V's exceptional ability to maintain temporal consistency.

View on arXiv
@article{liang2025_2405.15757,
  title={ Looking Backward: Streaming Video-to-Video Translation with Feature Banks },
  author={ Feng Liang and Akio Kodaira and Chenfeng Xu and Masayoshi Tomizuka and Kurt Keutzer and Diana Marculescu },
  journal={arXiv preprint arXiv:2405.15757},
  year={ 2025 }
}
Comments on this paper