Power-Constrained Bandits
Contextual bandits often provide simple and effective personalization in decision making problems, making them popular in many domains including digital health. However, when bandits are deployed in the context of a scientific study, the aim is not only to personalize for an individual, but also to determine, with sufficient statistical power, whether or not the system's intervention is effective. The two objectives are often deployed under different model assumptions, making it hard to determine how achieving one goal affects the other. In this work, we develop general meta-algorithms to modify existing algorithms such that sufficient power is guaranteed, without significant decrease in average return. We also demonstrate that our meta-algorithms are robust to various model mis-specifications.
View on arXiv