v1v2 (latest)

Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs

14 June 2020

Papers citing "Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs"

23 / 23 papers shown

Title
Adversarial bandit optimization for approximately linear functions Zhuoyu Cheng Kohei Hatano Eiji Takimoto 14 0 0 27 May 2025
Data-Dependent Regret Bounds for Constrained MABs Gianmarco Genalti Francesco Emanuele Stradi Matteo Castiglioni A. Marchesi N. Gatti 43 0 0 26 May 2025
Online Episodic Convex Reinforcement Learning B. Moreno Khaled Eldowa Pierre Gaillard Margaux Brégère Nadia Oudjane OffRL 194 0 0 12 May 2025
Efficient Near-Optimal Algorithm for Online Shortest Paths in Directed Acyclic Graphs with Bandit Feedback Against Adaptive Adversaries Arnab Maiti Zhiyuan Fan Kevin Jamieson Lillian J. Ratliff Gabriele Farina 513 1 0 01 Apr 2025
A Model Selection Approach for Corruption Robust Reinforcement Learning Chen-Yu Wei Christoph Dann Julian Zimmert 193 45 0 31 Dec 2024
Learnability in Online Kernel Selection with Memory Constraint via Data-dependent Regret Analysis Junfan Li Shizhong Liao 128 0 0 01 Jul 2024
Refined Sample Complexity for Markov Games with Independent Linear Function Approximation Yan Dai Qiwen Cui S. S. Du 89 1 0 11 Feb 2024
Scalable and Independent Learning of Nash Equilibrium Policies in $n$ -Player Stochastic Games with Unknown Independent Chains Tiancheng Qin S. Rasoul Etesami 83 2 0 04 Dec 2023
Towards Optimal Regret in Adversarial Linear MDPs with Bandit Feedback Haolin Liu Chen-Yu Wei Julian Zimmert 66 6 0 17 Oct 2023
Settling the Sample Complexity of Online Reinforcement Learning Zihan Zhang Yuxin Chen Jason D. Lee S. Du OffRL 204 25 0 25 Jul 2023
Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs Haipeng Luo Hanghang Tong Mengxiao Zhang Yuheng Zhang 48 5 0 04 Oct 2022
Adaptive Bandit Convex Optimization with Heterogeneous Curvature Haipeng Luo Mengxiao Zhang Penghui Zhao 91 5 0 12 Feb 2022
Adaptivity and Non-stationarity: Problem-dependent Dynamic Regret for Online Convex Optimization Peng Zhao Yu Zhang Lijun Zhang Zhi Zhou 110 50 0 29 Dec 2021
Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses Haipeng Luo Chen-Yu Wei Chung-Wei Lee 124 45 0 18 Jul 2021
The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition Tiancheng Jin Longbo Huang Haipeng Luo 84 42 0 08 Jun 2021
Improved Corruption Robust Algorithms for Episodic Reinforcement Learning Yifang Chen S. Du Kevin Jamieson 75 23 0 13 Feb 2021
Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously Chung-Wei Lee Haipeng Luo Chen-Yu Wei Mengxiao Zhang Xiaojin Zhang 96 49 0 11 Feb 2021
Robust Policy Gradient against Strong Data Corruption Xuezhou Zhang Yiding Chen Xiaojin Zhu Wen Sun AAML 99 39 0 11 Feb 2021
Non-stationary Reinforcement Learning without Prior Knowledge: An Optimal Black-box Approach Chen-Yu Wei Haipeng Luo OffRL 183 107 0 10 Feb 2021
Finding the Stochastic Shortest Path with Low Regret: The Adversarial Cost and Unknown Transition Case Liyu Chen Haipeng Luo 102 31 0 10 Feb 2021
Online Markov Decision Processes with Aggregate Bandit Feedback Alon Cohen Haim Kaplan Tomer Koren Yishay Mansour OffRL 82 8 0 31 Jan 2021
Corralling Stochastic Bandit Algorithms R. Arora T. V. Marinov M. Mohri 107 35 0 16 Jun 2020
Bandit Convex Optimization in Non-stationary Environments Peng Zhao G. Wang Lijun Zhang Zhi Zhou 112 44 0 29 Jul 2019