Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2006.08040
Cited By
v1
v2 (latest)
Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs
14 June 2020
Chung-Wei Lee
Haipeng Luo
Chen-Yu Wei
Mengxiao Zhang
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs"
23 / 23 papers shown
Title
Adversarial bandit optimization for approximately linear functions
Zhuoyu Cheng
Kohei Hatano
Eiji Takimoto
14
0
0
27 May 2025
Data-Dependent Regret Bounds for Constrained MABs
Gianmarco Genalti
Francesco Emanuele Stradi
Matteo Castiglioni
A. Marchesi
N. Gatti
43
0
0
26 May 2025
Online Episodic Convex Reinforcement Learning
B. Moreno
Khaled Eldowa
Pierre Gaillard
Margaux Brégère
Nadia Oudjane
OffRL
194
0
0
12 May 2025
Efficient Near-Optimal Algorithm for Online Shortest Paths in Directed Acyclic Graphs with Bandit Feedback Against Adaptive Adversaries
Arnab Maiti
Zhiyuan Fan
Kevin Jamieson
Lillian J. Ratliff
Gabriele Farina
513
1
0
01 Apr 2025
A Model Selection Approach for Corruption Robust Reinforcement Learning
Chen-Yu Wei
Christoph Dann
Julian Zimmert
193
45
0
31 Dec 2024
Learnability in Online Kernel Selection with Memory Constraint via Data-dependent Regret Analysis
Junfan Li
Shizhong Liao
128
0
0
01 Jul 2024
Refined Sample Complexity for Markov Games with Independent Linear Function Approximation
Yan Dai
Qiwen Cui
S. S. Du
89
1
0
11 Feb 2024
Scalable and Independent Learning of Nash Equilibrium Policies in
n
n
n
-Player Stochastic Games with Unknown Independent Chains
Tiancheng Qin
S. Rasoul Etesami
83
2
0
04 Dec 2023
Towards Optimal Regret in Adversarial Linear MDPs with Bandit Feedback
Haolin Liu
Chen-Yu Wei
Julian Zimmert
66
6
0
17 Oct 2023
Settling the Sample Complexity of Online Reinforcement Learning
Zihan Zhang
Yuxin Chen
Jason D. Lee
S. Du
OffRL
204
25
0
25 Jul 2023
Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs
Haipeng Luo
Hanghang Tong
Mengxiao Zhang
Yuheng Zhang
48
5
0
04 Oct 2022
Adaptive Bandit Convex Optimization with Heterogeneous Curvature
Haipeng Luo
Mengxiao Zhang
Penghui Zhao
91
5
0
12 Feb 2022
Adaptivity and Non-stationarity: Problem-dependent Dynamic Regret for Online Convex Optimization
Peng Zhao
Yu Zhang
Lijun Zhang
Zhi Zhou
110
50
0
29 Dec 2021
Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses
Haipeng Luo
Chen-Yu Wei
Chung-Wei Lee
124
45
0
18 Jul 2021
The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition
Tiancheng Jin
Longbo Huang
Haipeng Luo
84
42
0
08 Jun 2021
Improved Corruption Robust Algorithms for Episodic Reinforcement Learning
Yifang Chen
S. Du
Kevin Jamieson
75
23
0
13 Feb 2021
Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously
Chung-Wei Lee
Haipeng Luo
Chen-Yu Wei
Mengxiao Zhang
Xiaojin Zhang
96
49
0
11 Feb 2021
Robust Policy Gradient against Strong Data Corruption
Xuezhou Zhang
Yiding Chen
Xiaojin Zhu
Wen Sun
AAML
99
39
0
11 Feb 2021
Non-stationary Reinforcement Learning without Prior Knowledge: An Optimal Black-box Approach
Chen-Yu Wei
Haipeng Luo
OffRL
183
107
0
10 Feb 2021
Finding the Stochastic Shortest Path with Low Regret: The Adversarial Cost and Unknown Transition Case
Liyu Chen
Haipeng Luo
102
31
0
10 Feb 2021
Online Markov Decision Processes with Aggregate Bandit Feedback
Alon Cohen
Haim Kaplan
Tomer Koren
Yishay Mansour
OffRL
82
8
0
31 Jan 2021
Corralling Stochastic Bandit Algorithms
R. Arora
T. V. Marinov
M. Mohri
107
35
0
16 Jun 2020
Bandit Convex Optimization in Non-stationary Environments
Peng Zhao
G. Wang
Lijun Zhang
Zhi Zhou
112
44
0
29 Jul 2019
1