50
1

Contextual Bandits for Unbounded Context Distributions

Abstract

Nonparametric contextual bandit is an important model of sequential decision making problems. Under α\alpha-Tsybakov margin condition, existing research has established a regret bound of O~(T1α+1d+2)\tilde{O}\left(T^{1-\frac{\alpha+1}{d+2}}\right) for bounded supports. However, the optimal regret with unbounded contexts has not been analyzed. The challenge of solving contextual bandit problems with unbounded support is to achieve both exploration-exploitation tradeoff and bias-variance tradeoff simultaneously. In this paper, we solve the nonparametric contextual bandit problem with unbounded contexts. We propose two nearest neighbor methods combined with UCB exploration. The first method uses a fixed kk. Our analysis shows that this method achieves minimax optimal regret under a weak margin condition and relatively light-tailed context distributions. The second method uses adaptive kk. By a proper data-driven selection of kk, this method achieves an expected regret of O~(T1(α+1)βα+(d+2)β+T1β)\tilde{O}\left(T^{1-\frac{(\alpha+1)\beta}{\alpha+(d+2)\beta}}+T^{1-\beta}\right), in which β\beta is a parameter describing the tail strength. This bound matches the minimax lower bound up to logarithm factors, indicating that the second method is approximately optimal.

View on arXiv
@article{zhao2025_2408.09655,
  title={ Contextual Bandits for Unbounded Context Distributions },
  author={ Puning Zhao and Rongfei Fan and Shaowei Wang and Li Shen and Qixin Zhang and Zong Ke and Tianhang Zheng },
  journal={arXiv preprint arXiv:2408.09655},
  year={ 2025 }
}
Comments on this paper