Adaptive Shielding for Safe Reinforcement Learning under Hidden-Parameter Dynamics Shifts
Unseen shifts in environment dynamics, driven by hidden parameters such as friction or gravity, create a challenge for maintaining safety. We address this challenge by proposing Adaptive Shielding, a framework for safe reinforcement learning in constrained hidden-parameter Markov decision processes. A function encoder infers a low-dimensional representation of the underlying dynamics online from transition data, allowing the shield to adapt. To ensure safety during this process, we use a two-layer strategy. First, we introduce safety-regularized optimization that proactively trains the policy away from high-cost regions. Second, the adaptive shielding reactively uses the inferred dynamics to forecast safety risks and applies uncertainty-aware bounds using conformal prediction to filter unsafe actions. We prove that prediction errors in the shielding connect with bounds on the average cost rate. Empirically, across Safe-Gym benchmarks with varying hidden parameters, our approach outperforms baselines on the return-safety trade-off and generalizes reliably to unseen dynamics, while incurring only modest execution-time overhead. Code is available atthis https URL.
View on arXiv