Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1602.04951
Cited By
Q(
λ
λ
λ
) with Off-Policy Corrections
16 February 2016
Anna Harutyunyan
Marc G. Bellemare
T. Stepleton
Rémi Munos
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Q($λ$) with Off-Policy Corrections"
25 / 25 papers shown
Title
Off-policy Distributional Q(
λ
λ
λ
): Distributional RL without Importance Sampling
Yunhao Tang
Mark Rowland
Rémi Munos
Bernardo Avila-Pires
Will Dabney
OffRL
23
1
0
08 Feb 2024
Conservative Exploration for Policy Optimization via Off-Policy Policy Evaluation
Paul Daoudi
Mathias Formoso
Othman Gaizi
Achraf Azize
Evrard Garcelon
OffRL
31
0
0
24 Dec 2023
Human-AI Collaboration in Real-World Complex Environment with Reinforcement Learning
Md Saiful Islam
Srijita Das
S. Gottipati
William Duguay
Clodéric Mars
Jalal Arabneydi
Antoine Fagette
Matthew J. Guzdial
Matthew E. Taylor
41
1
0
23 Dec 2023
Sequential Counterfactual Risk Minimization
Houssam Zenati
Eustache Diemert
Matthieu Martin
Julien Mairal
Pierre Gaillard
OffRL
29
3
0
23 Feb 2023
Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning
Brett Daley
Martha White
Chris Amato
Marlos C. Machado
OffRL
25
3
0
26 Jan 2023
Off-Policy Evaluation for Action-Dependent Non-Stationary Environments
Yash Chandak
Shiv Shankar
Nathaniel D. Bastian
Bruno Castro da Silva
Emma Brunskil
Philip S. Thomas
OffRL
52
6
0
24 Jan 2023
Knowing the Past to Predict the Future: Reinforcement Virtual Learning
Peng Zhang
Yawen Huang
Bingzhang Hu
Shizheng Wang
Haoran Duan
Noura Al Moubayed
Yefeng Zheng
Yang Long
OffRL
27
0
0
02 Nov 2022
The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning
Yunhao Tang
Mark Rowland
Rémi Munos
Bernardo Avila-Pires
Will Dabney
Marc G. Bellemare
OffRL
35
11
0
15 Jul 2022
Improving the Efficiency of Off-Policy Reinforcement Learning by Accounting for Past Decisions
Brett Daley
Chris Amato
OffRL
23
1
0
23 Dec 2021
Flexible Option Learning
Martin Klissarov
Doina Precup
OffRL
41
26
0
06 Dec 2021
Supervised Off-Policy Ranking
Yue Jin
Yue Zhang
Tao Qin
Xudong Zhang
Jian Yuan
Houqiang Li
Tie-Yan Liu
OffRL
37
5
0
03 Jul 2021
A Lyapunov Theory for Finite-Sample Guarantees of Asynchronous Q-Learning and TD-Learning Variants
Zaiwei Chen
S. T. Maguluri
Sanjay Shakkottai
Karthikeyan Shanmugam
OffRL
105
54
0
02 Feb 2021
Revisiting Fundamentals of Experience Replay
W. Fedus
Prajit Ramachandran
Rishabh Agarwal
Yoshua Bengio
Hugo Larochelle
Mark Rowland
Will Dabney
KELM
OffRL
30
235
0
13 Jul 2020
Self-Imitation Learning via Generalized Lower Bound Q-learning
Yunhao Tang
SSL
33
24
0
12 Jun 2020
Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning
Cameron Voloshin
Hoang Minh Le
Nan Jiang
Yisong Yue
OffRL
35
152
0
15 Nov 2019
Gradient Q
(
σ
,
λ
)
(σ, λ)
(
σ
,
λ
)
: A Unified Algorithm with Function Approximation for Reinforcement Learning
Long Yang
Yu Zhang
Qian Zheng
Pengfei Li
Gang Pan
20
1
0
06 Sep 2019
When to use parametric models in reinforcement learning?
H. V. Hasselt
Matteo Hessel
John Aslanides
46
189
0
12 Jun 2019
Per-decision Multi-step Temporal Difference Learning with Control Variates
Kristopher De Asis
R. Sutton
27
7
0
05 Jul 2018
Qualitative Measurements of Policy Discrepancy for Return-Based Deep Q-Network
Wenjia Meng
Qian Zheng
L. Yang
Pengfei Li
Gang Pan
20
21
0
14 Jun 2018
Sample-Efficient Deep Reinforcement Learning via Episodic Backward Update
Su Young Lee
Sung-Ik Choi
Sae-Young Chung
BDL
21
73
0
31 May 2018
Meta-Gradient Reinforcement Learning
Zhongwen Xu
H. V. Hasselt
David Silver
53
324
0
24 May 2018
A Unified Approach for Multi-step Temporal-Difference Learning with Eligibility Traces in Reinforcement Learning
Long Yang
Minhao Shi
Qian Zheng
Wenjia Meng
Gang Pan
36
23
0
09 Feb 2018
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
L. Espeholt
Hubert Soyer
Rémi Munos
Karen Simonyan
Volodymyr Mnih
...
Vlad Firoiu
Tim Harley
Iain Dunning
Shane Legg
Koray Kavukcuoglu
75
1,578
0
05 Feb 2018
Multi-step Off-policy Learning Without Importance Sampling Ratios
A. R. Mahmood
Huizhen Yu
R. Sutton
OffRL
26
54
0
09 Feb 2017
Safe and Efficient Off-Policy Reinforcement Learning
Rémi Munos
T. Stepleton
Anna Harutyunyan
Marc G. Bellemare
OffRL
86
609
0
08 Jun 2016
1