Adaptive Doubly Robust Estimator and Paradox Concerning Logging Policy:
Off-policy Evaluation from Dependent Samples
- OffRL
A doubly robust (DR) estimator is crucial in causal inference, which consists of two nuisance parameters: the conditional mean outcome and logging policy (probability of choosing an action). This paper provides a DR estimator for dependent samples obtained in adaptive experiments and introduces two related topics. First, we propose adaptive-fitting as a variant of sample-splitting for showing an asymptotically normal semiparametric estimator from dependent samples without non-Donsker nuisance estimators. Second, we report an empirical paradox that a DR estimator shows better performances than other estimators using the true logging policy instead of its estimator. While a similar phenomenon is also known for estimators with i.i.d. samples, we hypothesize that traditional explanations based on asymptotic efficiency cannot elucidate our case with dependent samples. We confirm this hypothesis through simulation studies.
View on arXiv