Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2002.04014
Cited By
Statistically Efficient Off-Policy Policy Gradients
10 February 2020
Nathan Kallus
Masatoshi Uehara
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Statistically Efficient Off-Policy Policy Gradients"
31 / 31 papers shown
Title
A Nonparametric Off-Policy Policy Gradient
Samuele Tosatto
João Carvalho
Hany Abdulsamad
Jan Peters
OffRL
18
11
0
08 Jan 2020
AlgaeDICE: Policy Gradient from Arbitrary Experience
Ofir Nachum
Bo Dai
Ilya Kostrikov
Yinlam Chow
Lihong Li
Dale Schuurmans
OffRL
97
240
0
04 Dec 2019
From Importance Sampling to Doubly Robust Policy Gradient
Jiawei Huang
Nan Jiang
OffRL
54
24
0
20 Oct 2019
Understanding the Curse of Horizon in Off-Policy Evaluation via Conditional Importance Sampling
Yao Liu
Pierre-Luc Bacon
Emma Brunskill
OffRL
51
46
0
15 Oct 2019
Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning
Nathan Kallus
Masatoshi Uehara
OffRL
49
91
0
12 Sep 2019
Introduction to Online Convex Optimization
Elad Hazan
OffRL
104
1,922
0
07 Sep 2019
Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes
Nathan Kallus
Masatoshi Uehara
OffRL
68
185
0
22 Aug 2019
Trajectory-wise Control Variates for Variance Reduction in Policy Gradient Methods
Ching-An Cheng
Xinyan Yan
Byron Boots
41
22
0
08 Aug 2019
On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift
Alekh Agarwal
Sham Kakade
Jason D. Lee
G. Mahajan
48
320
0
01 Aug 2019
Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling
Tengyang Xie
Yifei Ma
Yu Wang
OffRL
86
181
0
08 Jun 2019
Learning When-to-Treat Policies
Xinkun Nie
Emma Brunskill
Stefan Wager
CML
OffRL
52
90
0
23 May 2019
Information-Theoretic Considerations in Batch Reinforcement Learning
Jinglin Chen
Nan Jiang
OOD
OffRL
109
373
0
01 May 2019
Off-Policy Deep Reinforcement Learning without Exploration
Scott Fujimoto
David Meger
Doina Precup
OffRL
BDL
173
1,586
0
07 Dec 2018
Top-K Off-Policy Correction for a REINFORCE Recommender System
Minmin Chen
Alex Beutel
Paul Covington
Sagar Jain
Francois Belletti
Ed H. Chi
CML
OffRL
112
476
0
06 Dec 2018
An Off-policy Policy Gradient Theorem Using Emphatic Weightings
Ehsan Imani
Eric Graves
Martha White
OffRL
49
71
0
22 Nov 2018
Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation
Qiang Liu
Lihong Li
Ziyang Tang
Dengyong Zhou
OffRL
110
354
0
29 Oct 2018
Policy Optimization via Importance Sampling
Alberto Maria Metelli
Matteo Papini
Francesco Faccio
Marcello Restelli
OffRL
75
89
0
17 Sep 2018
Stochastic Variance-Reduced Policy Gradient
Matteo Papini
Damiano Binaghi
Giuseppe Canonaco
Matteo Pirotta
Marcello Restelli
54
174
0
14 Jun 2018
Confounding-Robust Policy Improvement
Nathan Kallus
Angela Zhou
CML
OffRL
176
152
0
22 May 2018
Convergence guarantees for a class of non-convex and non-smooth optimization problems
K. Khamaru
Martin J. Wainwright
39
72
0
25 Apr 2018
Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines
Cathy Wu
Aravind Rajeswaran
Yan Duan
Vikash Kumar
Alexandre M. Bayen
Sham Kakade
Igor Mordatch
Pieter Abbeel
OffRL
50
151
0
20 Mar 2018
Non-convex Optimization for Machine Learning
Prateek Jain
Purushottam Kar
115
480
0
21 Dec 2017
Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning
S. Gu
Timothy Lillicrap
Zoubin Ghahramani
Richard Turner
Bernhard Schölkopf
Sergey Levine
OffRL
67
165
0
01 Jun 2017
Balanced Policy Evaluation and Learning
Nathan Kallus
CML
OffRL
266
141
0
21 May 2017
Policy Learning with Observational Data
Susan Athey
Stefan Wager
CML
OffRL
249
183
0
09 Feb 2017
Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
Philip S. Thomas
Emma Brunskill
OffRL
264
573
0
04 Apr 2016
Doubly Robust Off-policy Value Evaluation for Reinforcement Learning
Nan Jiang
Lihong Li
OffRL
155
621
0
11 Nov 2015
Gradient Estimation Using Stochastic Computation Graphs
John Schulman
N. Heess
T. Weber
Pieter Abbeel
OffRL
125
391
0
17 Jun 2015
High-Dimensional Continuous Control Using Generalized Advantage Estimation
John Schulman
Philipp Moritz
Sergey Levine
Michael I. Jordan
Pieter Abbeel
OffRL
50
3,368
0
08 Jun 2015
Off-Policy Actor-Critic
T. Degris
Martha White
R. Sutton
OffRL
CML
213
220
0
22 May 2012
Infinite-Horizon Policy-Gradient Estimation
Jonathan Baxter
Peter L. Bartlett
79
808
0
03 Jun 2011
1