v1v2v3 (latest)

Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function

Neural Information Processing Systems (NeurIPS), 2019

12 June 2019

Zihan Zhang

Xiangyang Ji

ArXiv (abs)PDF HTML

Papers citing "Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function"

50 / 52 papers shown

Title
Finite-Time Bounds for Average-Reward Fitted Q-Iteration Jongmin Lee Ernest K. Ryu OffRL 72 0 0 20 Oct 2025
Model Selection for Average Reward RL with Application to Utility Maximization in Repeated Games Alireza Masoumian James R. Wright 345 2 0 09 Nov 2024
Learning Infinite-Horizon Average-Reward Linear Mixture MDPs of Bounded SpanInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024 Woojin Chae Kihyuk Hong Yufan Zhang Ambuj Tewari Dabeen Lee 158 1 0 19 Oct 2024
Optimistic Q-learning for average reward and episodic reinforcement learning Priyank Agrawal Shipra Agrawal 333 6 0 18 Jul 2024
Reinforcement Learning and Regret Bounds for Admission ControlInternational Conference on Machine Learning (ICML), 2024 Lucas Weber A. Busic Jiamin Zhu 103 1 0 07 Jun 2024
Achieving Tractable Minimax Optimal Regret in Average Reward MDPs Victor Boone Zihan Zhang 179 7 0 03 Jun 2024
Finding good policies in average-reward Markov Decision Processes without prior knowledge Adrienne Tuynman Rémy Degenne Emilie Kaufmann 204 8 0 27 May 2024
Reinforcement Learning for Infinite-Horizon Average-Reward Linear MDPs via Approximation by Discounted-Reward MDPsInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024 Kihyuk Hong Yufan Zhang Ambuj Tewari Dabeen Lee Ambuj Tewari 317 1 0 23 May 2024
Sample-efficient Learning of Infinite-horizon Average-reward MDPs with General Function Approximation Jianliang He Han Zhong Zhuoran Yang 183 6 0 19 Apr 2024
Span-Based Optimal Sample Complexity for Weakly Communicating and General Average Reward MDPsNeural Information Processing Systems (NeurIPS), 2024 M. Zurek Yudong Chen 269 11 0 18 Mar 2024
Dealing with unbounded gradients in stochastic saddle-point optimization Gergely Neu Nneka Okolo 231 5 0 21 Feb 2024
Sharper Model-free Reinforcement Learning for Average-reward Markov Decision ProcessesAnnual Conference Computational Learning Theory (COLT), 2023 Zihan Zhang Qiaomin Xie OffRL 160 25 0 28 Jun 2023
Restarted Bayesian Online Change-point Detection for Non-Stationary Markov Decision Processes Réda Alami Mohammed Mahfoud Eric Moulines 154 3 0 01 Apr 2023
Reinforcement Learning in a Birth and Death Process: Breaking the Dependence on the State SpaceNeural Information Processing Systems (NeurIPS), 2023 Jonatha Anselmi B. Gaujal Louis-Sébastien Rebuffi 196 3 0 21 Feb 2023
Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic EnvironmentsInternational Conference on Machine Learning (ICML), 2023 Runlong Zhou Zihan Zhang S. Du 240 16 0 31 Jan 2023
Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision ProcessesInternational Conference on Machine Learning (ICML), 2022 Jiafan He Heyang Zhao Dongruo Zhou Quanquan Gu OffRL 337 62 0 12 Dec 2022
Near Sample-Optimal Reduction-based Policy Learning for Average Reward MDP Jinghan Wang Meng-Xian Wang Lin F. Yang 173 25 0 01 Dec 2022
Near-Optimal Regret Bounds for Multi-batch Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2022 Zihan Zhang Yuhang Jiang Yuanshuo Zhou Xiangyang Ji OffRL 167 13 0 15 Oct 2022
An Analysis of Model-Based Reinforcement Learning From Abstracted Observations Rolf A. N. Starre Marco Loog E. Congeduti F. Oliehoek OffRL 160 3 0 30 Aug 2022
Slowly Changing Adversarial Bandit Algorithms are Efficient for Discounted MDPsInternational Conference on Algorithmic Learning Theory (ALT), 2022 Ian A. Kash L. Reyzin Zishun Yu 258 0 0 18 May 2022
Provably Efficient Kernelized Q-Learning Shuang Liu H. Su MLT 233 4 0 21 Apr 2022
Horizon-Free Reinforcement Learning in Polynomial Time: the Power of Stationary PoliciesAnnual Conference Computational Learning Theory (COLT), 2022 Zihan Zhang Xiangyang Ji S. Du 201 28 0 24 Mar 2022
On learning Whittle index policy for restless bandits with scalable regretIEEE Transactions on Control of Network Systems (IEEE TCNS), 2022 N. Akbarzadeh Aditya Mahajan 233 14 0 07 Feb 2022
Learning Infinite-Horizon Average-Reward Markov Decision Processes with ConstraintsInternational Conference on Machine Learning (ICML), 2022 Liyu Chen R. Jain Haipeng Luo 242 30 0 31 Jan 2022
Dueling RL: Reinforcement Learning with Trajectory Preferences Aldo Pacchiano Aadirupa Saha Jonathan Lee 267 101 0 08 Nov 2021
Settling the Horizon-Dependence of Sample Complexity in Reinforcement LearningIEEE Annual Symposium on Foundations of Computer Science (FOCS), 2021 Yuanzhi Li Ruosong Wang Lin F. Yang 202 21 0 01 Nov 2021
Learning Stochastic Shortest Path with Linear Function ApproximationInternational Conference on Machine Learning (ICML), 2021 Steffen Czolbe Jiafan He Adrian Dalca Quanquan Gu 238 33 0 25 Oct 2021
Understanding Domain Randomization for Sim-to-real Transfer Xiaoyu Chen Jiachen Hu Chi Jin Lihong Li Liwei Wang 319 146 0 07 Oct 2021
A Bayesian Learning Algorithm for Unknown Zero-sum Stochastic Games with an Arbitrary OpponentInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2021 Mehdi Jafarnia-Jahromi Rahul Jain A. Nayyar 176 5 0 08 Sep 2021
Sublinear Regret for Learning POMDPsProduction and operations management (POM), 2021 Yi Xiong Yi Xiong Ningyuan Chen Xiang Zhou 265 25 0 08 Jul 2021
Reinforcement Learning for Markovian Bandits: Is Posterior Sampling more Scalable than Optimism? Nicolas Gast B. Gaujal K. Khun 238 2 0 16 Jun 2021
Online Learning for Unknown Partially Observable MDPsInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2021 Mehdi Jafarnia-Jahromi Rahul Jain A. Nayyar 220 21 0 25 Feb 2021
Near-Optimal Randomized Exploration for Tabular Markov Decision ProcessesNeural Information Processing Systems (NeurIPS), 2021 Zhihan Xiong Ruoqi Shen Qiwen Cui Maryam Fazel S. Du 148 10 0 19 Feb 2021
Causal Markov Decision Processes: Learning Good Interventions Efficiently Yangyi Lu A. Meisami Ambuj Tewari 113 12 0 15 Feb 2021
Nearly Minimax Optimal Regret for Learning Infinite-horizon Average-reward MDPs with Linear Function ApproximationInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2021 Yue Wu Dongruo Zhou Quanquan Gu 137 22 0 15 Feb 2021
Improved Variance-Aware Confidence Sets for Linear Bandits and Linear Mixture MDPNeural Information Processing Systems (NeurIPS), 2021 Zihan Zhang Jiaqi Yang Xiangyang Ji S. Du 273 45 0 29 Jan 2021
Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision ProcessesAnnual Conference Computational Learning Theory (COLT), 2020 Dongruo Zhou Quanquan Gu Csaba Szepesvári 202 224 0 15 Dec 2020
RL-QN: A Reinforcement Learning Framework for Optimal Control of Queueing SystemsACM Transactions on Modeling and Performance Evaluation of Computing Systems (ACM TOMPECS), 2020 Bai Liu Qiaomin Xie E. Modiano 155 22 0 14 Nov 2020
Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon Zihan Zhang Xiangyang Ji S. Du OffRL 227 113 0 28 Sep 2020
Improved Exploration in Factored Average-Reward MDPsInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2020 M. S. Talebi Anders Jonsson Odalric-Ambrym Maillard 161 8 0 09 Sep 2020
Learning Infinite-horizon Average-reward MDPs with Linear Function ApproximationInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2020 Chen-Yu Wei Mehdi Jafarnia-Jahromi Haipeng Luo Rahul Jain 234 51 0 23 Jul 2020
A Provably Efficient Sample Collection Strategy for Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2020 Jean Tarbouriech Matteo Pirotta Michal Valko A. Lazaric OffRL 180 18 0 13 Jul 2020
Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) OptimismInternational Conference on Machine Learning (ICML), 2020 Wang Chi Cheung D. Simchi-Levi Ruihao Zhu OffRL 170 107 0 24 Jun 2020
A Model-free Learning Algorithm for Infinite-horizon Average-reward MDPs with Near-optimal Regret Mehdi Jafarnia-Jahromi Chen-Yu Wei Rahul Jain Haipeng Luo 192 7 0 08 Jun 2020
Almost Optimal Model-Free Reinforcement Learning via Reference-Advantage Decomposition Zihan Zhang Yuanshuo Zhou Xiangyang Ji OffRL 161 172 0 21 Apr 2020
Tightening Exploration in Upper Confidence Reinforcement LearningInternational Conference on Machine Learning (ICML), 2020 Hippolyte Bourel Odalric-Ambrym Maillard M. S. Talebi 186 35 0 20 Apr 2020
Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial LossNeural Information Processing Systems (NeurIPS), 2020 Delin Qu Xiaohan Wei Zhuoran Yang Jieping Ye Zhaoran Wang 320 55 0 02 Mar 2020
Learning Near Optimal Policies with Low Inherent Bellman ErrorInternational Conference on Machine Learning (ICML), 2020 Andrea Zanette A. Lazaric Mykel Kochenderfer Emma Brunskill OffRL 259 232 0 29 Feb 2020
Near-optimal Regret Bounds for Stochastic Shortest PathInternational Conference on Machine Learning (ICML), 2020 Alon Cohen Haim Kaplan Yishay Mansour Aviv A. Rosenberg 160 60 0 23 Feb 2020
Learning Adversarial MDPs with Bandit Feedback and Unknown TransitionInternational Conference on Machine Learning (ICML), 2019 Chi Jin Tiancheng Jin Haipeng Luo S. Sra Tiancheng Yu 324 112 0 03 Dec 2019