ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.05110
  4. Cited By
Regret Minimization for Reinforcement Learning by Evaluating the Optimal
  Bias Function
v1v2v3 (latest)

Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function

Neural Information Processing Systems (NeurIPS), 2019
12 June 2019
Zihan Zhang
Xiangyang Ji
ArXiv (abs)PDFHTML

Papers citing "Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function"

50 / 52 papers shown
Title
Finite-Time Bounds for Average-Reward Fitted Q-Iteration
Finite-Time Bounds for Average-Reward Fitted Q-Iteration
Jongmin Lee
Ernest K. Ryu
OffRL
72
0
0
20 Oct 2025
Model Selection for Average Reward RL with Application to Utility
  Maximization in Repeated Games
Model Selection for Average Reward RL with Application to Utility Maximization in Repeated Games
Alireza Masoumian
James R. Wright
345
2
0
09 Nov 2024
Learning Infinite-Horizon Average-Reward Linear Mixture MDPs of Bounded
  Span
Learning Infinite-Horizon Average-Reward Linear Mixture MDPs of Bounded SpanInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Woojin Chae
Kihyuk Hong
Yufan Zhang
Ambuj Tewari
Dabeen Lee
158
1
0
19 Oct 2024
Optimistic Q-learning for average reward and episodic reinforcement learning
Optimistic Q-learning for average reward and episodic reinforcement learning
Priyank Agrawal
Shipra Agrawal
333
6
0
18 Jul 2024
Reinforcement Learning and Regret Bounds for Admission Control
Reinforcement Learning and Regret Bounds for Admission ControlInternational Conference on Machine Learning (ICML), 2024
Lucas Weber
A. Busic
Jiamin Zhu
103
1
0
07 Jun 2024
Achieving Tractable Minimax Optimal Regret in Average Reward MDPs
Achieving Tractable Minimax Optimal Regret in Average Reward MDPs
Victor Boone
Zihan Zhang
179
7
0
03 Jun 2024
Finding good policies in average-reward Markov Decision Processes
  without prior knowledge
Finding good policies in average-reward Markov Decision Processes without prior knowledge
Adrienne Tuynman
Rémy Degenne
Emilie Kaufmann
204
8
0
27 May 2024
Reinforcement Learning for Infinite-Horizon Average-Reward Linear MDPs via Approximation by Discounted-Reward MDPs
Reinforcement Learning for Infinite-Horizon Average-Reward Linear MDPs via Approximation by Discounted-Reward MDPsInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Kihyuk Hong
Yufan Zhang
Ambuj Tewari
Dabeen Lee
Ambuj Tewari
317
1
0
23 May 2024
Sample-efficient Learning of Infinite-horizon Average-reward MDPs with
  General Function Approximation
Sample-efficient Learning of Infinite-horizon Average-reward MDPs with General Function Approximation
Jianliang He
Han Zhong
Zhuoran Yang
183
6
0
19 Apr 2024
Span-Based Optimal Sample Complexity for Weakly Communicating and
  General Average Reward MDPs
Span-Based Optimal Sample Complexity for Weakly Communicating and General Average Reward MDPsNeural Information Processing Systems (NeurIPS), 2024
M. Zurek
Yudong Chen
269
11
0
18 Mar 2024
Dealing with unbounded gradients in stochastic saddle-point optimization
Dealing with unbounded gradients in stochastic saddle-point optimization
Gergely Neu
Nneka Okolo
231
5
0
21 Feb 2024
Sharper Model-free Reinforcement Learning for Average-reward Markov
  Decision Processes
Sharper Model-free Reinforcement Learning for Average-reward Markov Decision ProcessesAnnual Conference Computational Learning Theory (COLT), 2023
Zihan Zhang
Qiaomin Xie
OffRL
160
25
0
28 Jun 2023
Restarted Bayesian Online Change-point Detection for Non-Stationary
  Markov Decision Processes
Restarted Bayesian Online Change-point Detection for Non-Stationary Markov Decision Processes
Réda Alami
Mohammed Mahfoud
Eric Moulines
154
3
0
01 Apr 2023
Reinforcement Learning in a Birth and Death Process: Breaking the
  Dependence on the State Space
Reinforcement Learning in a Birth and Death Process: Breaking the Dependence on the State SpaceNeural Information Processing Systems (NeurIPS), 2023
Jonatha Anselmi
B. Gaujal
Louis-Sébastien Rebuffi
196
3
0
21 Feb 2023
Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both
  Worlds in Stochastic and Deterministic Environments
Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic EnvironmentsInternational Conference on Machine Learning (ICML), 2023
Runlong Zhou
Zihan Zhang
S. Du
240
16
0
31 Jan 2023
Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision
  Processes
Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision ProcessesInternational Conference on Machine Learning (ICML), 2022
Jiafan He
Heyang Zhao
Dongruo Zhou
Quanquan Gu
OffRL
337
62
0
12 Dec 2022
Near Sample-Optimal Reduction-based Policy Learning for Average Reward
  MDP
Near Sample-Optimal Reduction-based Policy Learning for Average Reward MDP
Jinghan Wang
Meng-Xian Wang
Lin F. Yang
173
25
0
01 Dec 2022
Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning
Near-Optimal Regret Bounds for Multi-batch Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2022
Zihan Zhang
Yuhang Jiang
Yuanshuo Zhou
Xiangyang Ji
OffRL
167
13
0
15 Oct 2022
An Analysis of Model-Based Reinforcement Learning From Abstracted
  Observations
An Analysis of Model-Based Reinforcement Learning From Abstracted Observations
Rolf A. N. Starre
Marco Loog
E. Congeduti
F. Oliehoek
OffRL
160
3
0
30 Aug 2022
Slowly Changing Adversarial Bandit Algorithms are Efficient for
  Discounted MDPs
Slowly Changing Adversarial Bandit Algorithms are Efficient for Discounted MDPsInternational Conference on Algorithmic Learning Theory (ALT), 2022
Ian A. Kash
L. Reyzin
Zishun Yu
258
0
0
18 May 2022
Provably Efficient Kernelized Q-Learning
Provably Efficient Kernelized Q-Learning
Shuang Liu
H. Su
MLT
233
4
0
21 Apr 2022
Horizon-Free Reinforcement Learning in Polynomial Time: the Power of
  Stationary Policies
Horizon-Free Reinforcement Learning in Polynomial Time: the Power of Stationary PoliciesAnnual Conference Computational Learning Theory (COLT), 2022
Zihan Zhang
Xiangyang Ji
S. Du
201
28
0
24 Mar 2022
On learning Whittle index policy for restless bandits with scalable
  regret
On learning Whittle index policy for restless bandits with scalable regretIEEE Transactions on Control of Network Systems (IEEE TCNS), 2022
N. Akbarzadeh
Aditya Mahajan
233
14
0
07 Feb 2022
Learning Infinite-Horizon Average-Reward Markov Decision Processes with
  Constraints
Learning Infinite-Horizon Average-Reward Markov Decision Processes with ConstraintsInternational Conference on Machine Learning (ICML), 2022
Liyu Chen
R. Jain
Haipeng Luo
242
30
0
31 Jan 2022
Dueling RL: Reinforcement Learning with Trajectory Preferences
Dueling RL: Reinforcement Learning with Trajectory Preferences
Aldo Pacchiano
Aadirupa Saha
Jonathan Lee
267
101
0
08 Nov 2021
Settling the Horizon-Dependence of Sample Complexity in Reinforcement
  Learning
Settling the Horizon-Dependence of Sample Complexity in Reinforcement LearningIEEE Annual Symposium on Foundations of Computer Science (FOCS), 2021
Yuanzhi Li
Ruosong Wang
Lin F. Yang
202
21
0
01 Nov 2021
Learning Stochastic Shortest Path with Linear Function Approximation
Learning Stochastic Shortest Path with Linear Function ApproximationInternational Conference on Machine Learning (ICML), 2021
Steffen Czolbe
Jiafan He
Adrian Dalca
Quanquan Gu
238
33
0
25 Oct 2021
Understanding Domain Randomization for Sim-to-real Transfer
Understanding Domain Randomization for Sim-to-real Transfer
Xiaoyu Chen
Jiachen Hu
Chi Jin
Lihong Li
Liwei Wang
319
146
0
07 Oct 2021
A Bayesian Learning Algorithm for Unknown Zero-sum Stochastic Games with
  an Arbitrary Opponent
A Bayesian Learning Algorithm for Unknown Zero-sum Stochastic Games with an Arbitrary OpponentInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2021
Mehdi Jafarnia-Jahromi
Rahul Jain
A. Nayyar
176
5
0
08 Sep 2021
Sublinear Regret for Learning POMDPs
Sublinear Regret for Learning POMDPsProduction and operations management (POM), 2021
Yi Xiong
Yi Xiong
Ningyuan Chen
Xiang Zhou
265
25
0
08 Jul 2021
Reinforcement Learning for Markovian Bandits: Is Posterior Sampling more
  Scalable than Optimism?
Reinforcement Learning for Markovian Bandits: Is Posterior Sampling more Scalable than Optimism?
Nicolas Gast
B. Gaujal
K. Khun
238
2
0
16 Jun 2021
Online Learning for Unknown Partially Observable MDPs
Online Learning for Unknown Partially Observable MDPsInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2021
Mehdi Jafarnia-Jahromi
Rahul Jain
A. Nayyar
220
21
0
25 Feb 2021
Near-Optimal Randomized Exploration for Tabular Markov Decision
  Processes
Near-Optimal Randomized Exploration for Tabular Markov Decision ProcessesNeural Information Processing Systems (NeurIPS), 2021
Zhihan Xiong
Ruoqi Shen
Qiwen Cui
Maryam Fazel
S. Du
148
10
0
19 Feb 2021
Causal Markov Decision Processes: Learning Good Interventions
  Efficiently
Causal Markov Decision Processes: Learning Good Interventions Efficiently
Yangyi Lu
A. Meisami
Ambuj Tewari
113
12
0
15 Feb 2021
Nearly Minimax Optimal Regret for Learning Infinite-horizon
  Average-reward MDPs with Linear Function Approximation
Nearly Minimax Optimal Regret for Learning Infinite-horizon Average-reward MDPs with Linear Function ApproximationInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2021
Yue Wu
Dongruo Zhou
Quanquan Gu
137
22
0
15 Feb 2021
Improved Variance-Aware Confidence Sets for Linear Bandits and Linear
  Mixture MDP
Improved Variance-Aware Confidence Sets for Linear Bandits and Linear Mixture MDPNeural Information Processing Systems (NeurIPS), 2021
Zihan Zhang
Jiaqi Yang
Xiangyang Ji
S. Du
273
45
0
29 Jan 2021
Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov
  Decision Processes
Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision ProcessesAnnual Conference Computational Learning Theory (COLT), 2020
Dongruo Zhou
Quanquan Gu
Csaba Szepesvári
202
224
0
15 Dec 2020
RL-QN: A Reinforcement Learning Framework for Optimal Control of
  Queueing Systems
RL-QN: A Reinforcement Learning Framework for Optimal Control of Queueing SystemsACM Transactions on Modeling and Performance Evaluation of Computing Systems (ACM TOMPECS), 2020
Bai Liu
Qiaomin Xie
E. Modiano
155
22
0
14 Nov 2020
Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal
  Algorithm Escaping the Curse of Horizon
Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon
Zihan Zhang
Xiangyang Ji
S. Du
OffRL
227
113
0
28 Sep 2020
Improved Exploration in Factored Average-Reward MDPs
Improved Exploration in Factored Average-Reward MDPsInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2020
M. S. Talebi
Anders Jonsson
Odalric-Ambrym Maillard
161
8
0
09 Sep 2020
Learning Infinite-horizon Average-reward MDPs with Linear Function
  Approximation
Learning Infinite-horizon Average-reward MDPs with Linear Function ApproximationInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2020
Chen-Yu Wei
Mehdi Jafarnia-Jahromi
Haipeng Luo
Rahul Jain
234
51
0
23 Jul 2020
A Provably Efficient Sample Collection Strategy for Reinforcement
  Learning
A Provably Efficient Sample Collection Strategy for Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2020
Jean Tarbouriech
Matteo Pirotta
Michal Valko
A. Lazaric
OffRL
180
18
0
13 Jul 2020
Reinforcement Learning for Non-Stationary Markov Decision Processes: The
  Blessing of (More) Optimism
Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) OptimismInternational Conference on Machine Learning (ICML), 2020
Wang Chi Cheung
D. Simchi-Levi
Ruihao Zhu
OffRL
170
107
0
24 Jun 2020
A Model-free Learning Algorithm for Infinite-horizon Average-reward MDPs with Near-optimal Regret
Mehdi Jafarnia-Jahromi
Chen-Yu Wei
Rahul Jain
Haipeng Luo
192
7
0
08 Jun 2020
Almost Optimal Model-Free Reinforcement Learning via Reference-Advantage
  Decomposition
Almost Optimal Model-Free Reinforcement Learning via Reference-Advantage Decomposition
Zihan Zhang
Yuanshuo Zhou
Xiangyang Ji
OffRL
161
172
0
21 Apr 2020
Tightening Exploration in Upper Confidence Reinforcement Learning
Tightening Exploration in Upper Confidence Reinforcement LearningInternational Conference on Machine Learning (ICML), 2020
Hippolyte Bourel
Odalric-Ambrym Maillard
M. S. Talebi
186
35
0
20 Apr 2020
Upper Confidence Primal-Dual Reinforcement Learning for CMDP with
  Adversarial Loss
Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial LossNeural Information Processing Systems (NeurIPS), 2020
Delin Qu
Xiaohan Wei
Zhuoran Yang
Jieping Ye
Zhaoran Wang
320
55
0
02 Mar 2020
Learning Near Optimal Policies with Low Inherent Bellman Error
Learning Near Optimal Policies with Low Inherent Bellman ErrorInternational Conference on Machine Learning (ICML), 2020
Andrea Zanette
A. Lazaric
Mykel Kochenderfer
Emma Brunskill
OffRL
259
232
0
29 Feb 2020
Near-optimal Regret Bounds for Stochastic Shortest Path
Near-optimal Regret Bounds for Stochastic Shortest PathInternational Conference on Machine Learning (ICML), 2020
Alon Cohen
Haim Kaplan
Yishay Mansour
Aviv A. Rosenberg
160
60
0
23 Feb 2020
Learning Adversarial MDPs with Bandit Feedback and Unknown Transition
Learning Adversarial MDPs with Bandit Feedback and Unknown TransitionInternational Conference on Machine Learning (ICML), 2019
Chi Jin
Tiancheng Jin
Haipeng Luo
S. Sra
Tiancheng Yu
324
112
0
03 Dec 2019
12
Next