A data-driven approach to the forecasting of ground-level ozone concentration

The ability to forecast the concentration of air pollutants in an urban region is crucial for decision-makers wishing to reduce the impact of pollution on public health through active measures (e.g. temporary traffic closures). In this study, we present a machine learning approach applied to the forecast of the day-ahead maximum value of the ozone concentration for several geographical locations in southern Switzerland. Starting from a dataset containing thousands of historical air quality and weather data as well as numerical weather predictions, the most relevant features are selected using a genetic algorithm and then used to train a number of regression models. After assessing that forcing engineered features suggested by experts in the domain into the initial population of the genetic algorithm does not increase the final forecasters' accuracy, we adopted a procedure entirely agnostic for atmospheric physics. We then used Shapley values to explain the learned models in terms of feature importance and feature interactions in relation to ozone predictions. Our analysis suggests that the trained models effectively learned explanatory cross-dependencies among atmospheric variables, which are described in the ozone photochemistry literature.
View on arXiv