Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1807.02373
Cited By
v1
v2 (latest)
Near Optimal Exploration-Exploitation in Non-Communicating Markov Decision Processes
6 July 2018
Ronan Fruit
Matteo Pirotta
A. Lazaric
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Near Optimal Exploration-Exploitation in Non-Communicating Markov Decision Processes"
29 / 29 papers shown
Title
Model Selection for Average Reward RL with Application to Utility Maximization in Repeated Games
Alireza Masoumian
James R. Wright
142
1
0
09 Nov 2024
Beyond Optimism: Exploration With Partially Observable Rewards
Simone Parisi
Alireza Kazemipour
Michael Bowling
OffRL
96
2
0
20 Jun 2024
Finding good policies in average-reward Markov Decision Processes without prior knowledge
Adrienne Tuynman
Rémy Degenne
Emilie Kaufmann
100
4
0
27 May 2024
Span-Based Optimal Sample Complexity for Weakly Communicating and General Average Reward MDPs
M. Zurek
Yudong Chen
73
6
0
18 Mar 2024
Dealing with unbounded gradients in stochastic saddle-point optimization
Gergely Neu
Nneka Okolo
87
5
0
21 Feb 2024
A Study of Global and Episodic Bonuses for Exploration in Contextual MDPs
Mikael Henaff
Minqi Jiang
Roberta Raileanu
86
13
0
05 Jun 2023
Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments
Runlong Zhou
Zihan Zhang
S. Du
87
12
0
31 Jan 2023
Multi-Armed Bandits with Self-Information Rewards
Nir Weinberger
M. Yemini
22
4
0
06 Sep 2022
Slowly Changing Adversarial Bandit Algorithms are Efficient for Discounted MDPs
Ian A. Kash
L. Reyzin
Zishun Yu
101
0
0
18 May 2022
Horizon-Free Reinforcement Learning in Polynomial Time: the Power of Stationary Policies
Zihan Zhang
Xiangyang Ji
S. Du
83
25
0
24 Mar 2022
Near-Optimal Randomized Exploration for Tabular Markov Decision Processes
Zhihan Xiong
Ruoqi Shen
Qiwen Cui
Maryam Fazel
S. Du
85
10
0
19 Feb 2021
Nearly Minimax Optimal Regret for Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation
Yue Wu
Dongruo Zhou
Quanquan Gu
62
21
0
15 Feb 2021
Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon
Zihan Zhang
Xiangyang Ji
S. Du
OffRL
128
107
0
28 Sep 2020
Adaptive KL-UCB based Bandit Algorithms for Markovian and i.i.d. Settings
Member Ieee Arghyadip Roy
Fellow Ieee Sanjay Shakkottai
F. I. R. Srikant
50
2
0
14 Sep 2020
Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism
Wang Chi Cheung
D. Simchi-Levi
Ruihao Zhu
OffRL
98
96
0
24 Jun 2020
A Survey of Reinforcement Learning Algorithms for Dynamically Varying Environments
Sindhu Padakandla
75
155
0
19 May 2020
Learning Algorithms for Minimizing Queue Length Regret
Thomas Stahlbuhk
B. Shrader
E. Modiano
21
2
0
11 May 2020
Tightening Exploration in Upper Confidence Reinforcement Learning
Hippolyte Bourel
Odalric-Ambrym Maillard
M. S. Talebi
71
31
0
20 Apr 2020
Conservative Exploration in Reinforcement Learning
Evrard Garcelon
Mohammad Ghavamzadeh
A. Lazaric
Matteo Pirotta
80
28
0
08 Feb 2020
No-Regret Exploration in Goal-Oriented Reinforcement Learning
Jean Tarbouriech
Evrard Garcelon
Michal Valko
Matteo Pirotta
A. Lazaric
105
46
0
07 Dec 2019
Performance Effectiveness of Multimedia Information Search Using the Epsilon-Greedy Algorithm
Nikki Lijing Kuang
C. Leung
29
8
0
22 Nov 2019
The Restless Hidden Markov Bandit with Linear Rewards and Side Information
M. Yemini
Amir Leshem
A. Somekh-Baruch
84
4
0
22 Oct 2019
Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes
Chen-Yu Wei
Mehdi Jafarnia-Jahromi
Haipeng Luo
Hiteshi Sharma
R. Jain
168
108
0
15 Oct 2019
Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards
Falcon Z. Dai
Matthew R. Walter
36
6
0
03 Jul 2019
Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function
Zihan Zhang
Xiangyang Ji
82
72
0
12 Jun 2019
Non-Stationary Reinforcement Learning: The Blessing of (More) Optimism
Wang Chi Cheung
D. Simchi-Levi
Ruihao Zhu
OffRL
58
7
0
07 Jun 2019
Exploration-Exploitation Trade-off in Reinforcement Learning on Online Markov Decision Processes with Global Concave Rewards
Wang Chi Cheung
51
18
0
15 May 2019
Exploration Bonus for Regret Minimization in Undiscounted Discrete and Continuous Markov Decision Processes
Jian Qian
Ronan Fruit
Matteo Pirotta
A. Lazaric
47
10
0
11 Dec 2018
Regret Bounds for Reinforcement Learning via Markov Chain Concentration
R. Ortner
93
46
0
06 Aug 2018
1