EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation

Handling complex or nonlinear motion patterns has long posed challenges for video frame interpolation. Although recent advances in diffusion-based methods offer improvements over traditional optical flow-based approaches, they still struggle to generate sharp, temporally consistent frames in scenarios with large motion. To address this limitation, we introduce EDEN, an Enhanced Diffusion for high-quality large-motion vidEo frame iNterpolation. Our approach first utilizes a transformer-based tokenizer to produce refined latent representations of the intermediate frames for diffusion models. We then enhance the diffusion transformer with temporal attention across the process and incorporate a start-end frame difference embedding to guide the generation of dynamic motion. Extensive experiments demonstrate that EDEN achieves state-of-the-art results across popular benchmarks, including nearly a 10% LPIPS reduction on DAVIS and SNU-FILM, and an 8% improvement on DAIN-HD.
View on arXiv@article{zhang2025_2503.15831, title={ EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation }, author={ Zihao Zhang and Haoran Chen and Haoyu Zhao and Guansong Lu and Yanwei Fu and Hang Xu and Zuxuan Wu }, journal={arXiv preprint arXiv:2503.15831}, year={ 2025 } }