22
0

FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution

Abstract

A versatile video depth estimation model should (1) be accurate and consistent across frames, (2) produce high-resolution depth maps, and (3) support real-time streaming. We propose FlashDepth, a method that satisfies all three requirements, performing depth estimation on a 2044x1148 streaming video at 24 FPS. We show that, with careful modifications to pretrained single-image depth models, these capabilities are enabled with relatively little data and training. We evaluate our approach across multiple unseen datasets against state-of-the-art depth models, and find that ours outperforms them in terms of boundary sharpness and speed by a significant margin, while maintaining competitive accuracy. We hope our model will enable various applications that require high-resolution depth, such as video editing, and online decision-making, such as robotics.

View on arXiv
@article{chou2025_2504.07093,
  title={ FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution },
  author={ Gene Chou and Wenqi Xian and Guandao Yang and Mohamed Abdelfattah and Bharath Hariharan and Noah Snavely and Ning Yu and Paul Debevec },
  journal={arXiv preprint arXiv:2504.07093},
  year={ 2025 }
}
Comments on this paper