Off-Policy Evaluation of Bandit Algorithm from Dependent Samples under Batch Update Policy

23 October 2020

Papers citing "Off-Policy Evaluation of Bandit Algorithm from Dependent Samples under Batch Update Policy"

2 / 2 papers shown

Title
Adaptive Doubly Robust Estimator from Non-stationary Logging Policy under a Convergence of Average Probability Masahiro Kato OffRL 182 0 0 17 Feb 2021
Policy design in experiments with unknown interference Davide Viviano Jess Rudder 415 10 0 16 Nov 2020