147

Exploring Time-Step Size in Reinforcement Learning for Sepsis Treatment

Main:9 Pages
6 Figures
Bibliography:4 Pages
10 Tables
Appendix:5 Pages
Abstract

Existing studies on reinforcement learning (RL) for sepsis management have mostly followed an established problem setup, in which patient data are aggregated into 4-hour time steps. Although concerns have been raised regarding the coarseness of this time-step size, which might distort patient dynamics and lead to suboptimal treatment policies, the extent to which this is a problem in practice remains unexplored. In this work, we conducted empirical experiments for a controlled comparison of four time-step sizes (Δt ⁣= ⁣1,2,4,8\Delta t\!=\!1,2,4,8 h) on this domain, following an identical offline RL pipeline. To enable a fair comparison across time-step sizes, we designed action re-mapping methods that allow for evaluation of policies on datasets with different time-step sizes, and conducted cross-Δt\Delta t model selections under two policy learning setups. Our goal was to quantify how time-step size influences state representation learning, behavior cloning, policy training, and off-policy evaluation. Our results show that performance trends across Δt\Delta t vary as learning setups change, while policies learned at finer time-step sizes (Δt=1\Delta t = 1 h and 22 h) using a static behavior policy achieve the overall best performance and stability. Our work highlights time-step size as a core design choice in offline RL for healthcare and provides evidence supporting alternatives beyond the conventional 4-hour setup.

View on arXiv
Comments on this paper