ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.08040
  4. Cited By
Bias no more: high-probability data-dependent regret bounds for
  adversarial bandits and MDPs
v1v2 (latest)

Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs

14 June 2020
Chung-Wei Lee
Haipeng Luo
Chen-Yu Wei
Mengxiao Zhang
ArXiv (abs)PDFHTML

Papers citing "Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs"

23 / 23 papers shown
Title
Adversarial bandit optimization for approximately linear functions
Adversarial bandit optimization for approximately linear functions
Zhuoyu Cheng
Kohei Hatano
Eiji Takimoto
10
0
0
27 May 2025
Data-Dependent Regret Bounds for Constrained MABs
Data-Dependent Regret Bounds for Constrained MABs
Gianmarco Genalti
Francesco Emanuele Stradi
Matteo Castiglioni
A. Marchesi
N. Gatti
38
0
0
26 May 2025
Online Episodic Convex Reinforcement Learning
Online Episodic Convex Reinforcement Learning
B. Moreno
Khaled Eldowa
Pierre Gaillard
Margaux Brégère
Nadia Oudjane
OffRL
186
0
0
12 May 2025
Efficient Near-Optimal Algorithm for Online Shortest Paths in Directed Acyclic Graphs with Bandit Feedback Against Adaptive Adversaries
Efficient Near-Optimal Algorithm for Online Shortest Paths in Directed Acyclic Graphs with Bandit Feedback Against Adaptive Adversaries
Arnab Maiti
Zhiyuan Fan
Kevin Jamieson
Lillian J. Ratliff
Gabriele Farina
511
1
0
01 Apr 2025
A Model Selection Approach for Corruption Robust Reinforcement Learning
A Model Selection Approach for Corruption Robust Reinforcement Learning
Chen-Yu Wei
Christoph Dann
Julian Zimmert
193
45
0
31 Dec 2024
Learnability in Online Kernel Selection with Memory Constraint via Data-dependent Regret Analysis
Learnability in Online Kernel Selection with Memory Constraint via Data-dependent Regret Analysis
Junfan Li
Shizhong Liao
128
0
0
01 Jul 2024
Refined Sample Complexity for Markov Games with Independent Linear
  Function Approximation
Refined Sample Complexity for Markov Games with Independent Linear Function Approximation
Yan Dai
Qiwen Cui
S. S. Du
89
1
0
11 Feb 2024
Scalable and Independent Learning of Nash Equilibrium Policies in
  $n$-Player Stochastic Games with Unknown Independent Chains
Scalable and Independent Learning of Nash Equilibrium Policies in nnn-Player Stochastic Games with Unknown Independent Chains
Tiancheng Qin
S. Rasoul Etesami
81
2
0
04 Dec 2023
Towards Optimal Regret in Adversarial Linear MDPs with Bandit Feedback
Towards Optimal Regret in Adversarial Linear MDPs with Bandit Feedback
Haolin Liu
Chen-Yu Wei
Julian Zimmert
66
6
0
17 Oct 2023
Settling the Sample Complexity of Online Reinforcement Learning
Settling the Sample Complexity of Online Reinforcement Learning
Zihan Zhang
Yuxin Chen
Jason D. Lee
S. Du
OffRL
202
25
0
25 Jul 2023
Improved High-Probability Regret for Adversarial Bandits with
  Time-Varying Feedback Graphs
Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs
Haipeng Luo
Hanghang Tong
Mengxiao Zhang
Yuheng Zhang
46
5
0
04 Oct 2022
Adaptive Bandit Convex Optimization with Heterogeneous Curvature
Adaptive Bandit Convex Optimization with Heterogeneous Curvature
Haipeng Luo
Mengxiao Zhang
Penghui Zhao
91
5
0
12 Feb 2022
Adaptivity and Non-stationarity: Problem-dependent Dynamic Regret for
  Online Convex Optimization
Adaptivity and Non-stationarity: Problem-dependent Dynamic Regret for Online Convex Optimization
Peng Zhao
Yu Zhang
Lijun Zhang
Zhi Zhou
110
50
0
29 Dec 2021
Policy Optimization in Adversarial MDPs: Improved Exploration via
  Dilated Bonuses
Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses
Haipeng Luo
Chen-Yu Wei
Chung-Wei Lee
124
45
0
18 Jul 2021
The best of both worlds: stochastic and adversarial episodic MDPs with
  unknown transition
The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition
Tiancheng Jin
Longbo Huang
Haipeng Luo
84
42
0
08 Jun 2021
Improved Corruption Robust Algorithms for Episodic Reinforcement
  Learning
Improved Corruption Robust Algorithms for Episodic Reinforcement Learning
Yifang Chen
S. Du
Kevin Jamieson
75
23
0
13 Feb 2021
Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic
  and Adversarial Linear Bandits Simultaneously
Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously
Chung-Wei Lee
Haipeng Luo
Chen-Yu Wei
Mengxiao Zhang
Xiaojin Zhang
96
49
0
11 Feb 2021
Robust Policy Gradient against Strong Data Corruption
Robust Policy Gradient against Strong Data Corruption
Xuezhou Zhang
Yiding Chen
Xiaojin Zhu
Wen Sun
AAML
99
39
0
11 Feb 2021
Non-stationary Reinforcement Learning without Prior Knowledge: An
  Optimal Black-box Approach
Non-stationary Reinforcement Learning without Prior Knowledge: An Optimal Black-box Approach
Chen-Yu Wei
Haipeng Luo
OffRL
183
107
0
10 Feb 2021
Finding the Stochastic Shortest Path with Low Regret: The Adversarial
  Cost and Unknown Transition Case
Finding the Stochastic Shortest Path with Low Regret: The Adversarial Cost and Unknown Transition Case
Liyu Chen
Haipeng Luo
102
31
0
10 Feb 2021
Online Markov Decision Processes with Aggregate Bandit Feedback
Online Markov Decision Processes with Aggregate Bandit Feedback
Alon Cohen
Haim Kaplan
Tomer Koren
Yishay Mansour
OffRL
80
8
0
31 Jan 2021
Corralling Stochastic Bandit Algorithms
Corralling Stochastic Bandit Algorithms
R. Arora
T. V. Marinov
M. Mohri
103
35
0
16 Jun 2020
Bandit Convex Optimization in Non-stationary Environments
Bandit Convex Optimization in Non-stationary Environments
Peng Zhao
G. Wang
Lijun Zhang
Zhi Zhou
112
44
0
29 Jul 2019
1