Repurposing Pre-trained Video Diffusion Models for Event-based Video Interpolation

10 December 2024

Abstract

Video Frame Interpolation aims to recover realistic missing frames between observed frames, generating a high-frame-rate video from a low-frame-rate video. However, without additional guidance, the large motion between frames makes this problem ill-posed. Event-based Video Frame Interpolation (EVFI) addresses this challenge by using sparse, high-temporal-resolution event measurements as motion guidance. This guidance allows EVFI methods to significantly outperform frame-only methods. However, to date, EVFI methods have relied on a limited set of paired event-frame training data, severely limiting their performance and generalization capabilities. In this work, we overcome the limited data challenge by adapting pre-trained video diffusion models trained on internet-scale datasets to EVFI. We experimentally validate our approach on real-world EVFI datasets, including a new one that we introduce. Our method outperforms existing methods and generalizes across cameras far better than existing approaches.

View on arXiv

@article{chen2025_2412.07761,
  title={ Repurposing Pre-trained Video Diffusion Models for Event-based Video Interpolation },
  author={ Jingxi Chen and Brandon Y. Feng and Haoming Cai and Tianfu Wang and Levi Burner and Dehao Yuan and Cornelia Fermuller and Christopher A. Metzler and Yiannis Aloimonos },
  journal={arXiv preprint arXiv:2412.07761},
  year={ 2025 }
}

Comments on this paper