Modern data-driven surrogate models for weather forecasting provide accurate short-term predictions but inaccurate and nonphysical long-term forecasts. This paper investigates online weather prediction using machine learning surrogates supplemented with partial and noisy observations. We empirically demonstrate and theoretically justify that, despite the long-time instability of the surrogates and the sparsity of the observations, filtering estimates can remain accurate in the long-time horizon. As a case study, we integrate FourCastNet, a weather surrogate model, within a variational data assimilation framework using partial, noisy ERA5 data. Our results show that filtering estimates remain accurate over a year-long assimilation window and provide effective initial conditions for forecasting tasks, including extreme event prediction.
View on arXiv@article{adrian2025_2405.13180, title={ Data Assimilation with Machine Learning Surrogate Models: A Case Study with FourCastNet }, author={ Melissa Adrian and Daniel Sanz-Alonso and Rebecca Willett }, journal={arXiv preprint arXiv:2405.13180}, year={ 2025 } }