Counterfactual Inference under Thompson Sampling
ACM Conference on Recommender Systems (RecSys), 2025
- OffRLLRM
Main:3 Pages
1 Figures
Bibliography:3 Pages
Appendix:1 Pages
Abstract
Recommender systems exemplify sequential decision-making under uncertainty, strategically deciding what content to serve to users, to optimise a range of potential objectives. To balance the explore-exploit trade-off successfully, Thompson sampling provides a natural and widespread paradigm to probabilistically select which action to take. Questions of causal and counterfactual inference, which underpin use-cases like offline evaluation, are not straightforward to answer in these contexts. Specifically, whilst most existing estimators rely on action propensities, these are not readily available under Thompson sampling procedures.
View on arXivComments on this paper
