Efficient Counterfactual Learning from Bandit Feedback

Efficient Counterfactual Learning from Bandit Feedback

10 September 2018

Papers citing "Efficient Counterfactual Learning from Bandit Feedback"

14 / 14 papers shown

Title
Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation Yuta Saito Shunsuke Aihara Megumi Matsutani Yusuke Narita OffRL 124 75 0 17 Aug 2020
Estimation Considerations in Contextual Bandits Maria Dimakopoulou Zhengyuan Zhou Susan Athey Guido Imbens 115 69 0 19 Nov 2017
Effective Evaluation using Logged Bandit Feedback from Multiple Loggers Aman Agarwal Soumya Basu Tobias Schnabel Thorsten Joachims OffRL 101 68 0 17 Mar 2017
Optimal and Adaptive Off-policy Evaluation in Contextual Bandits Yu Wang Alekh Agarwal Miroslav Dudík OffRL 70 220 0 04 Dec 2016
Safe and Efficient Off-Policy Reinforcement Learning Rémi Munos T. Stepleton Anna Harutyunyan Marc G. Bellemare OffRL 130 611 0 08 Jun 2016
Off-policy evaluation for slate recommendation Adith Swaminathan A. Krishnamurthy Alekh Agarwal Miroslav Dudík John Langford Damien Jose I. Zitouni CML OffRL 48 227 0 16 May 2016
Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning Philip S. Thomas Emma Brunskill OffRL 264 573 0 04 Apr 2016
Estimation and Inference of Heterogeneous Treatment Effects using Random Forests Stefan Wager Susan Athey SyDa CML 190 2,474 0 14 Oct 2015
Doubly Robust Policy Evaluation and Optimization Miroslav Dudík D. Erhan John Langford Lihong Li OffRL 145 285 0 10 Mar 2015
Counterfactual Reasoning and Learning Systems Léon Bottou J. Peters J. Q. Candela Denis Xavier Charles D. M. Chickering Elon Portugaly Dipankar Ray Patrice Y. Simard Edward Snelson CML OffRL 225 781 0 11 Sep 2012
Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms Lihong Li Wei Chu John Langford Xuanhui Wang OffRL 168 574 0 31 Mar 2010
A Contextual-Bandit Approach to Personalized News Article Recommendation Lihong Li Wei Chu John Langford Robert Schapire 317 2,935 0 28 Feb 2010
Learning from Logged Implicit Exploration Data Alexander L. Strehl John Langford Sham Kakade Lihong Li OffRL 123 254 0 27 Feb 2010
Semiparametric efficiency in GMM models with auxiliary data Xiaohong Chen H. Hong Alessandro Tarozzi 386 234 0 01 May 2007