537
v1v2 (latest)

HR-Bandit: Human-AI Collaborated Linear Recourse Bandit

International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Main:8 Pages
9 Figures
Bibliography:4 Pages
Appendix:7 Pages
Abstract

Human doctors frequently recommend actionable recourses that allow patients to modify their conditions to access more effective treatments. Inspired by such healthcare scenarios, we propose the Recourse Linear UCB (RLinUCB\textsf{RLinUCB}) algorithm, which optimizes both action selection and feature modifications by balancing exploration and exploitation. We further extend this to the Human-AI Linear Recourse Bandit (HR-Bandit\textsf{HR-Bandit}), which integrates human expertise to enhance performance. HR-Bandit\textsf{HR-Bandit} offers three key guarantees: (i) a warm-start guarantee for improved initial performance, (ii) a human-effort guarantee to minimize required human interactions, and (iii) a robustness guarantee that ensures sublinear regret even when human decisions are suboptimal. Empirical results, including a healthcare case study, validate its superior performance against existing benchmarks.

View on arXiv
Comments on this paper