Experimental Design for Semiparametric Bandits

We study finite-armed semiparametric bandits, where each arm's reward combines a linear component with an unknown, potentially adversarial shift. This model strictly generalizes classical linear bandits and reflects complexities common in practice. We propose the first experimental-design approach that simultaneously offers a sharp regret bound, a PAC bound, and a best-arm identification guarantee. Our method attains the minimax regret , matching the known lower bound for finite-armed linear bandits, and further achieves logarithmic regret under a positive suboptimality gap condition. These guarantees follow from our refined non-asymptotic analysis of orthogonalized regression that attains the optimal rate, paving the way for robust and efficient learning across a broad class of semiparametric bandit problems.
View on arXiv@article{kim2025_2506.13390, title={ Experimental Design for Semiparametric Bandits }, author={ Seok-Jin Kim and Gi-Soo Kim and Min-hwan Oh }, journal={arXiv preprint arXiv:2506.13390}, year={ 2025 } }