Off-Policy Evaluation via Adaptive Weighting with Data from Contextual
Bandits

Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits

3 June 2021

David A. Hirshberg

Papers citing "Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits"

18 / 18 papers shown

Title
Online Estimation and Inference for Robust Policy Evaluation in Reinforcement Learning Weidong Liu Jiyuan Tu Yichen Zhang Xi Chen OffRL 45 4 0 04 Oct 2023
Tractable contextual bandits beyond realizability Sanath Kumar Krishnamurthy Vitor Hadad Susan Athey 22 8 0 25 Oct 2020
Optimal Off-Policy Evaluation from Multiple Logging Policies Nathan Kallus Yuta Saito Masatoshi Uehara OffRL 33 40 0 21 Oct 2020
On conditional versus marginal bias in multi-armed bandits Jaehyeok Shin Aaditya Ramdas Alessandro Rinaldo 22 12 0 19 Feb 2020
Inference for Batched Bandits Kelly W. Zhang Lucas Janson Susan Murphy 64 82 0 08 Feb 2020
Confidence Intervals for Policy Evaluation in Adaptive Experiments Vitor Hadad David A. Hirshberg Ruohan Zhan Stefan Wager Susan Athey 36 143 0 07 Nov 2019
Doubly robust off-policy evaluation with shrinkage Yi-Hsun Su Maria Dimakopoulou A. Krishnamurthy Miroslav Dudík OffRL 38 104 0 22 Jul 2019
Are sample means in multi-armed bandits positively or negatively biased? Jaehyeok Shin Aaditya Ramdas Alessandro Rinaldo 33 36 0 27 May 2019
Estimation Considerations in Contextual Bandits Maria Dimakopoulou Zhengyuan Zhou Susan Athey Guido Imbens 80 69 0 19 Nov 2017
Why Adaptively Collected Data Have Negative Bias and How to Correct for It Xinkun Nie Xiaoying Tian Jonathan E. Taylor James Zou OnRL 48 88 0 07 Aug 2017
Effective Evaluation using Logged Bandit Feedback from Multiple Loggers Aman Agarwal Soumya Basu Tobias Schnabel Thorsten Joachims OffRL 90 68 0 17 Mar 2017
Optimal and Adaptive Off-policy Evaluation in Contextual Bandits Yu Wang Alekh Agarwal Miroslav Dudík OffRL 59 220 0 04 Dec 2016
Statistical inference for the mean outcome under a possibly non-unique optimal treatment strategy Alexander Luedtke M. J. van der Laan 65 220 0 24 Mar 2016
OpenML: networked science in machine learning Joaquin Vanschoren Jan N. van Rijn B. Bischl Luís Torgo FedML AI4CE 103 1,310 0 29 Jul 2014
Thompson Sampling for Contextual Bandits with Linear Payoffs Shipra Agrawal Navin Goyal 136 993 0 15 Sep 2012
Counterfactual Reasoning and Learning Systems Léon Bottou J. Peters J. Q. Candela Denis Xavier Charles D. M. Chickering Elon Portugaly Dipankar Ray Patrice Y. Simard Edward Snelson CML OffRL 180 781 0 11 Sep 2012
Doubly Robust Policy Evaluation and Learning Miroslav Dudík John Langford Lihong Li OffRL 157 694 0 23 Mar 2011
Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms Lihong Li Wei Chu John Langford Xuanhui Wang OffRL 152 574 0 31 Mar 2010