104
v1v2v3 (latest)

HydroGEM: A Self Supervised Zero Shot Hybrid TCN Transformer Foundation Model for Continental Scale Streamflow Quality Control

Ijaz Ul Haq
Byung Suk Lee
Julia N. Perdrial
David Baude
Main:73 Pages
13 Figures
Bibliography:12 Pages
16 Tables
Abstract

Advances in sensor networks have enabled real-time stream discharge monitoring, yet persistent sensor malfunctions limit data utility. Manual quality control by expert hydrologists cannot scale with networks generating millions of measurements annually. We introduce HydroGEM, a foundation model for continental-scale streamflow quality control designed to support human expertise. HydroGEM uses self-supervised pretraining on 6.03 million clean sequences from 3,724 USGS stations to learn general hydrological representations, followed by fine-tuning with synthetic anomalies for detection and reconstruction. A hybrid TCN-Transformer architecture (14.2M parameters) captures both local and long-range temporal dependencies, while hierarchical normalization handles six orders of magnitude in discharge. On held-out observations from 799 stations with 18 synthetic anomaly types grounded in USGS standards, HydroGEM achieves F1=0.792 for detection and 68.7% reconstruction error reduction, outperforming the strongest baseline by 36.3%. For cross-national validation on 100 Environment and Climate Change Canada stations using tolerant evaluation with a plus or minus 24-hour buffer, HydroGEM achieves Tolerant F1=0.70 with 90.1% segment-level event detection, demonstrating cross-national generalization. The model maintains consistent detection across correction magnitudes and aligns with operational seasonal patterns, with peak flagging during winter ice-affected periods matching hydrologists' correction behavior. Architectural separation between simplified training anomalies and complex test anomalies confirms that performance reflects learned hydrometric principles rather than pattern memorization.

View on arXiv
Comments on this paper