Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1905.10506
Cited By
v1
v2
v3 (latest)
A Kernel Loss for Solving the Bellman Equation
Neural Information Processing Systems (NeurIPS), 2019
25 May 2019
Yihao Feng
Lihong Li
Qiang Liu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"A Kernel Loss for Solving the Bellman Equation"
47 / 47 papers shown
Title
Offline Reinforcement Learning in Large State Spaces: Algorithms and Guarantees
Nan Jiang
Tengyang Xie
OffRL
148
10
0
05 Oct 2025
Neural Network Approximation for Pessimistic Offline Reinforcement Learning
Di Wu
Yuling Jiao
Li Shen
Haizhao Yang
Xiliang Lu
OffRL
242
1
0
19 Dec 2023
Distributional Shift-Aware Off-Policy Interval Estimation: A Unified Error Quantification Framework
Wenzhuo Zhou
Yuhan Li
Ruoqing Zhu
Annie Qu
OffRL
245
7
0
23 Sep 2023
A new Gradient TD Algorithm with only One Step-size: Convergence Rate Analysis using
L
L
L
-
λ
λ
λ
Smoothness
Hengshuai Yao
268
3
0
29 Jul 2023
LLQL: Logistic Likelihood Q-Learning for Reinforcement Learning
Outongyi Lv
Bingxin Zhou
OffRL
256
0
0
05 Jul 2023
TD Convergence: An Optimization Perspective
Neural Information Processing Systems (NeurIPS), 2023
Kavosh Asadi
Shoham Sabach
Yao Liu
Omer Gottesman
Rasool Fakoor
MU
241
12
0
30 Jun 2023
K
K
K
-Nearest-Neighbor Resampling for Off-Policy Evaluation in Stochastic Control
Michael Giegrich
Roel Oomen
C. Reisinger
OffRL
184
2
0
07 Jun 2023
Distributional Offline Policy Evaluation with Predictive Error Guarantees
International Conference on Machine Learning (ICML), 2023
Runzhe Wu
Masatoshi Uehara
Wen Sun
OffRL
193
18
0
19 Feb 2023
Provably Efficient Offline Goal-Conditioned Reinforcement Learning with General Function Approximation and Single-Policy Concentrability
Neural Information Processing Systems (NeurIPS), 2023
Hanlin Zhu
Amy Zhang
OffRL
242
5
0
07 Feb 2023
Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement Learning
Neural Information Processing Systems (NeurIPS), 2023
Hanlin Zhu
Paria Rashidinejad
Jiantao Jiao
OffRL
319
19
0
30 Jan 2023
Control of Continuous Quantum Systems with Many Degrees of Freedom based on Convergent Reinforcement Learning
Zhikang T. Wang
122
0
0
21 Dec 2022
A Review of Off-Policy Evaluation in Reinforcement Learning
Masatoshi Uehara
C. Shi
Nathan Kallus
OffRL
210
99
0
13 Dec 2022
Optimal Conservative Offline RL with General Function Approximation via Augmented Lagrangian
International Conference on Learning Representations (ICLR), 2022
Paria Rashidinejad
Hanlin Zhu
Kunhe Yang
Stuart J. Russell
Jiantao Jiao
OffRL
340
32
0
01 Nov 2022
A Unified Framework for Alternating Offline Model Training and Policy Learning
Neural Information Processing Systems (NeurIPS), 2022
Shentao Yang
Shujian Zhang
Yihao Feng
Mi Zhou
OffRL
212
17
0
12 Oct 2022
Inference on Strongly Identified Functionals of Weakly Identified Functions
Annual Conference Computational Learning Theory (COLT), 2022
Andrew Bennett
Nathan Kallus
Xiaojie Mao
Whitney Newey
Vasilis Syrgkanis
Masatoshi Uehara
286
21
0
17 Aug 2022
Future-Dependent Value-Based Off-Policy Evaluation in POMDPs
Neural Information Processing Systems (NeurIPS), 2022
Masatoshi Uehara
Haruka Kiyohara
Andrew Bennett
Victor Chernozhukov
Nan Jiang
Nathan Kallus
C. Shi
Wen Sun
OffRL
373
23
0
26 Jul 2022
Robust Losses for Learning Value Functions
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Andrew Patterson
Victor Liao
Martha White
256
17
0
17 May 2022
Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error
International Conference on Machine Learning (ICML), 2022
Scott Fujimoto
David Meger
Doina Precup
Ofir Nachum
S. Gu
277
37
0
28 Jan 2022
Hyperparameter Selection Methods for Fitted Q-Evaluation with Error Guarantee
Kohei Miyaguchi
OffRL
223
1
0
07 Jan 2022
A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes
International Conference on Machine Learning (ICML), 2021
C. Shi
Masatoshi Uehara
Jiawei Huang
Nan Jiang
OffRL
257
28
0
12 Nov 2021
Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning
Siyuan Zhang
Nan Jiang
OffRL
310
40
0
26 Oct 2021
Optimal policy evaluation using kernel-based temporal difference methods
Annals of Statistics (Ann. Stat.), 2021
Yaqi Duan
Mengdi Wang
Martin J. Wainwright
OffRL
140
30
0
24 Sep 2021
Convergent and Efficient Deep Q Network Algorithm
Zhikang T. Wang
Masahito Ueda
192
14
0
29 Jun 2021
Bayesian Bellman Operators
Neural Information Processing Systems (NeurIPS), 2021
M. Fellows
Kristian Hartikainen
Shimon Whiteson
OffRL
140
18
0
09 Jun 2021
Instrument Space Selection for Kernel Maximum Moment Restriction
Rui Zhang
Krikamol Muandet
Bernhard Schölkopf
Masaaki Imaizumi
123
3
0
07 Jun 2021
Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings
Neural Information Processing Systems (NeurIPS), 2021
Ming Yin
Yu Wang
OffRL
249
19
0
13 May 2021
Causal Inference Under Unmeasured Confounding With Negative Controls: A Minimax Learning Approach
Nathan Kallus
Xiaojie Mao
Masatoshi Uehara
CML
333
73
0
25 Mar 2021
Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism
IEEE Transactions on Information Theory (IEEE Trans. Inf. Theory), 2021
Paria Rashidinejad
Banghua Zhu
Cong Ma
Jiantao Jiao
Stuart J. Russell
OffRL
666
309
0
22 Mar 2021
Non-asymptotic Confidence Intervals of Off-policy Evaluation: Primal and Dual Bounds
International Conference on Learning Representations (ICLR), 2021
Yihao Feng
Ziyang Tang
Na Zhang
Qiang Liu
OffRL
178
14
0
09 Mar 2021
Minimax Model Learning
International Conference on Artificial Intelligence and Statistics (AISTATS), 2021
Cameron Voloshin
Nan Jiang
Yisong Yue
OffRL
261
19
0
02 Mar 2021
Off-Policy Interval Estimation with Lipschitz Value Iteration
Neural Information Processing Systems (NeurIPS), 2020
Ziyang Tang
Yihao Feng
Na Zhang
Jian Peng
Qiang Liu
OffRL
127
6
0
29 Oct 2020
Logistic Q-Learning
Joan Bas-Serrano
Sebastian Curi
Andreas Krause
Gergely Neu
279
42
0
21 Oct 2020
Instrumental Variable Regression via Kernel Maximum Moment Loss
Rui Zhang
Masaaki Imaizumi
Bernhard Schölkopf
Krikamol Muandet
285
13
0
15 Oct 2020
Accountable Off-Policy Evaluation With Kernel Bellman Statistics
Yihao Feng
Zhaolin Ren
Ziyang Tang
Qiang Liu
OffRL
208
45
0
15 Aug 2020
Convex Q-Learning, Part 1: Deterministic Optimal Control
P. Mehta
Sean P. Meyn
144
4
0
08 Aug 2020
Proximal Deterministic Policy Gradient
Marco Maggipinto
Gian Antonio Susto
Pratik Chaudhari
OffRL
103
5
0
03 Aug 2020
Towards a practical measure of interference for reinforcement learning
Vincent Liu
Adam White
Hengshuai Yao
Martha White
147
6
0
07 Jul 2020
Gradient Temporal-Difference Learning with Regularized Corrections
Sina Ghiassian
Andrew Patterson
Shivam Garg
Dhawal Gupta
Adam White
Martha White
338
45
0
01 Jul 2020
Band-limited Soft Actor Critic Model
Miguel Campo
Zhengxing Chen
Luke Kung
Kittipat Virochsiri
Jianyu Wang
109
6
0
19 Jun 2020
A maximum-entropy approach to off-policy evaluation in average-reward MDPs
N. Lazić
Dong Yin
Mehrdad Farajtabar
Nir Levine
Dilan Görür
Chris Harris
Dale Schuurmans
OffRL
151
12
0
17 Jun 2020
Q* Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison
Tengyang Xie
Nan Jiang
229
34
0
09 Mar 2020
Kalman meets Bellman: Improving Policy Evaluation through Value Tracking
Shirli Di-Castro Shashua
Shie Mannor
OffRL
141
14
0
17 Feb 2020
Minimax Value Interval for Off-Policy Evaluation and Policy Optimization
Nan Jiang
Jiawei Huang
OffRL
347
17
0
06 Feb 2020
AlgaeDICE: Policy Gradient from Arbitrary Experience
Ofir Nachum
Bo Dai
Ilya Kostrikov
Yinlam Chow
Lihong Li
Dale Schuurmans
OffRL
271
255
0
04 Dec 2019
Minimax Weight and Q-Function Learning for Off-Policy Evaluation
International Conference on Machine Learning (ICML), 2019
Masatoshi Uehara
Jiawei Huang
Nan Jiang
OffRL
352
193
0
28 Oct 2019
Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation
International Conference on Learning Representations (ICLR), 2019
Ziyang Tang
Yihao Feng
Lihong Li
Dengyong Zhou
Qiang Liu
OffRL
243
70
0
16 Oct 2019
Deep Residual Reinforcement Learning
Adaptive Agents and Multi-Agent Systems (AAMAS), 2019
Shangtong Zhang
Wendelin Bohmer
Shimon Whiteson
333
34
0
03 May 2019
1