Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1906.05110
Cited By
v1
v2
v3 (latest)
Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function
Neural Information Processing Systems (NeurIPS), 2019
12 June 2019
Zihan Zhang
Xiangyang Ji
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function"
50 / 52 papers shown
Title
Finite-Time Bounds for Average-Reward Fitted Q-Iteration
Jongmin Lee
Ernest K. Ryu
OffRL
72
0
0
20 Oct 2025
Model Selection for Average Reward RL with Application to Utility Maximization in Repeated Games
Alireza Masoumian
James R. Wright
345
2
0
09 Nov 2024
Learning Infinite-Horizon Average-Reward Linear Mixture MDPs of Bounded Span
International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Woojin Chae
Kihyuk Hong
Yufan Zhang
Ambuj Tewari
Dabeen Lee
158
1
0
19 Oct 2024
Optimistic Q-learning for average reward and episodic reinforcement learning
Priyank Agrawal
Shipra Agrawal
333
6
0
18 Jul 2024
Reinforcement Learning and Regret Bounds for Admission Control
International Conference on Machine Learning (ICML), 2024
Lucas Weber
A. Busic
Jiamin Zhu
103
1
0
07 Jun 2024
Achieving Tractable Minimax Optimal Regret in Average Reward MDPs
Victor Boone
Zihan Zhang
179
7
0
03 Jun 2024
Finding good policies in average-reward Markov Decision Processes without prior knowledge
Adrienne Tuynman
Rémy Degenne
Emilie Kaufmann
204
8
0
27 May 2024
Reinforcement Learning for Infinite-Horizon Average-Reward Linear MDPs via Approximation by Discounted-Reward MDPs
International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Kihyuk Hong
Yufan Zhang
Ambuj Tewari
Dabeen Lee
Ambuj Tewari
317
1
0
23 May 2024
Sample-efficient Learning of Infinite-horizon Average-reward MDPs with General Function Approximation
Jianliang He
Han Zhong
Zhuoran Yang
183
6
0
19 Apr 2024
Span-Based Optimal Sample Complexity for Weakly Communicating and General Average Reward MDPs
Neural Information Processing Systems (NeurIPS), 2024
M. Zurek
Yudong Chen
269
11
0
18 Mar 2024
Dealing with unbounded gradients in stochastic saddle-point optimization
Gergely Neu
Nneka Okolo
231
5
0
21 Feb 2024
Sharper Model-free Reinforcement Learning for Average-reward Markov Decision Processes
Annual Conference Computational Learning Theory (COLT), 2023
Zihan Zhang
Qiaomin Xie
OffRL
160
25
0
28 Jun 2023
Restarted Bayesian Online Change-point Detection for Non-Stationary Markov Decision Processes
Réda Alami
Mohammed Mahfoud
Eric Moulines
154
3
0
01 Apr 2023
Reinforcement Learning in a Birth and Death Process: Breaking the Dependence on the State Space
Neural Information Processing Systems (NeurIPS), 2023
Jonatha Anselmi
B. Gaujal
Louis-Sébastien Rebuffi
196
3
0
21 Feb 2023
Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments
International Conference on Machine Learning (ICML), 2023
Runlong Zhou
Zihan Zhang
S. Du
240
16
0
31 Jan 2023
Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes
International Conference on Machine Learning (ICML), 2022
Jiafan He
Heyang Zhao
Dongruo Zhou
Quanquan Gu
OffRL
337
62
0
12 Dec 2022
Near Sample-Optimal Reduction-based Policy Learning for Average Reward MDP
Jinghan Wang
Meng-Xian Wang
Lin F. Yang
173
25
0
01 Dec 2022
Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning
Neural Information Processing Systems (NeurIPS), 2022
Zihan Zhang
Yuhang Jiang
Yuanshuo Zhou
Xiangyang Ji
OffRL
167
13
0
15 Oct 2022
An Analysis of Model-Based Reinforcement Learning From Abstracted Observations
Rolf A. N. Starre
Marco Loog
E. Congeduti
F. Oliehoek
OffRL
160
3
0
30 Aug 2022
Slowly Changing Adversarial Bandit Algorithms are Efficient for Discounted MDPs
International Conference on Algorithmic Learning Theory (ALT), 2022
Ian A. Kash
L. Reyzin
Zishun Yu
258
0
0
18 May 2022
Provably Efficient Kernelized Q-Learning
Shuang Liu
H. Su
MLT
233
4
0
21 Apr 2022
Horizon-Free Reinforcement Learning in Polynomial Time: the Power of Stationary Policies
Annual Conference Computational Learning Theory (COLT), 2022
Zihan Zhang
Xiangyang Ji
S. Du
201
28
0
24 Mar 2022
On learning Whittle index policy for restless bandits with scalable regret
IEEE Transactions on Control of Network Systems (IEEE TCNS), 2022
N. Akbarzadeh
Aditya Mahajan
233
14
0
07 Feb 2022
Learning Infinite-Horizon Average-Reward Markov Decision Processes with Constraints
International Conference on Machine Learning (ICML), 2022
Liyu Chen
R. Jain
Haipeng Luo
242
30
0
31 Jan 2022
Dueling RL: Reinforcement Learning with Trajectory Preferences
Aldo Pacchiano
Aadirupa Saha
Jonathan Lee
267
101
0
08 Nov 2021
Settling the Horizon-Dependence of Sample Complexity in Reinforcement Learning
IEEE Annual Symposium on Foundations of Computer Science (FOCS), 2021
Yuanzhi Li
Ruosong Wang
Lin F. Yang
202
21
0
01 Nov 2021
Learning Stochastic Shortest Path with Linear Function Approximation
International Conference on Machine Learning (ICML), 2021
Steffen Czolbe
Jiafan He
Adrian Dalca
Quanquan Gu
238
33
0
25 Oct 2021
Understanding Domain Randomization for Sim-to-real Transfer
Xiaoyu Chen
Jiachen Hu
Chi Jin
Lihong Li
Liwei Wang
319
146
0
07 Oct 2021
A Bayesian Learning Algorithm for Unknown Zero-sum Stochastic Games with an Arbitrary Opponent
International Conference on Artificial Intelligence and Statistics (AISTATS), 2021
Mehdi Jafarnia-Jahromi
Rahul Jain
A. Nayyar
176
5
0
08 Sep 2021
Sublinear Regret for Learning POMDPs
Production and operations management (POM), 2021
Yi Xiong
Yi Xiong
Ningyuan Chen
Xiang Zhou
265
25
0
08 Jul 2021
Reinforcement Learning for Markovian Bandits: Is Posterior Sampling more Scalable than Optimism?
Nicolas Gast
B. Gaujal
K. Khun
238
2
0
16 Jun 2021
Online Learning for Unknown Partially Observable MDPs
International Conference on Artificial Intelligence and Statistics (AISTATS), 2021
Mehdi Jafarnia-Jahromi
Rahul Jain
A. Nayyar
220
21
0
25 Feb 2021
Near-Optimal Randomized Exploration for Tabular Markov Decision Processes
Neural Information Processing Systems (NeurIPS), 2021
Zhihan Xiong
Ruoqi Shen
Qiwen Cui
Maryam Fazel
S. Du
148
10
0
19 Feb 2021
Causal Markov Decision Processes: Learning Good Interventions Efficiently
Yangyi Lu
A. Meisami
Ambuj Tewari
113
12
0
15 Feb 2021
Nearly Minimax Optimal Regret for Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation
International Conference on Artificial Intelligence and Statistics (AISTATS), 2021
Yue Wu
Dongruo Zhou
Quanquan Gu
137
22
0
15 Feb 2021
Improved Variance-Aware Confidence Sets for Linear Bandits and Linear Mixture MDP
Neural Information Processing Systems (NeurIPS), 2021
Zihan Zhang
Jiaqi Yang
Xiangyang Ji
S. Du
273
45
0
29 Jan 2021
Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes
Annual Conference Computational Learning Theory (COLT), 2020
Dongruo Zhou
Quanquan Gu
Csaba Szepesvári
202
224
0
15 Dec 2020
RL-QN: A Reinforcement Learning Framework for Optimal Control of Queueing Systems
ACM Transactions on Modeling and Performance Evaluation of Computing Systems (ACM TOMPECS), 2020
Bai Liu
Qiaomin Xie
E. Modiano
155
22
0
14 Nov 2020
Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon
Zihan Zhang
Xiangyang Ji
S. Du
OffRL
227
113
0
28 Sep 2020
Improved Exploration in Factored Average-Reward MDPs
International Conference on Artificial Intelligence and Statistics (AISTATS), 2020
M. S. Talebi
Anders Jonsson
Odalric-Ambrym Maillard
161
8
0
09 Sep 2020
Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation
International Conference on Artificial Intelligence and Statistics (AISTATS), 2020
Chen-Yu Wei
Mehdi Jafarnia-Jahromi
Haipeng Luo
Rahul Jain
234
51
0
23 Jul 2020
A Provably Efficient Sample Collection Strategy for Reinforcement Learning
Neural Information Processing Systems (NeurIPS), 2020
Jean Tarbouriech
Matteo Pirotta
Michal Valko
A. Lazaric
OffRL
180
18
0
13 Jul 2020
Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism
International Conference on Machine Learning (ICML), 2020
Wang Chi Cheung
D. Simchi-Levi
Ruihao Zhu
OffRL
170
107
0
24 Jun 2020
A Model-free Learning Algorithm for Infinite-horizon Average-reward MDPs with Near-optimal Regret
Mehdi Jafarnia-Jahromi
Chen-Yu Wei
Rahul Jain
Haipeng Luo
192
7
0
08 Jun 2020
Almost Optimal Model-Free Reinforcement Learning via Reference-Advantage Decomposition
Zihan Zhang
Yuanshuo Zhou
Xiangyang Ji
OffRL
161
172
0
21 Apr 2020
Tightening Exploration in Upper Confidence Reinforcement Learning
International Conference on Machine Learning (ICML), 2020
Hippolyte Bourel
Odalric-Ambrym Maillard
M. S. Talebi
186
35
0
20 Apr 2020
Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss
Neural Information Processing Systems (NeurIPS), 2020
Delin Qu
Xiaohan Wei
Zhuoran Yang
Jieping Ye
Zhaoran Wang
320
55
0
02 Mar 2020
Learning Near Optimal Policies with Low Inherent Bellman Error
International Conference on Machine Learning (ICML), 2020
Andrea Zanette
A. Lazaric
Mykel Kochenderfer
Emma Brunskill
OffRL
259
232
0
29 Feb 2020
Near-optimal Regret Bounds for Stochastic Shortest Path
International Conference on Machine Learning (ICML), 2020
Alon Cohen
Haim Kaplan
Yishay Mansour
Aviv A. Rosenberg
160
60
0
23 Feb 2020
Learning Adversarial MDPs with Bandit Feedback and Unknown Transition
International Conference on Machine Learning (ICML), 2019
Chi Jin
Tiancheng Jin
Haipeng Luo
S. Sra
Tiancheng Yu
324
112
0
03 Dec 2019
1
2
Next