v1v2v3 (latest)

Tightening Exploration in Upper Confidence Reinforcement Learning

International Conference on Machine Learning (ICML), 2020

20 April 2020

Hippolyte Bourel

Odalric-Ambrym Maillard

M. S. Talebi

ArXiv (abs)PDF HTML

Papers citing "Tightening Exploration in Upper Confidence Reinforcement Learning"

23 / 23 papers shown

Tail Distribution of Regret in Optimistic Reinforcement Learning

Sajad Khodadadian

Mehrdad Moharrami

129

23 Nov 2025

The Confusing Instance Principle for Online Linear Quadratic Control

Waris Radji

Odalric-Ambrym Maillard

OffRL

177

22 Oct 2025

Towards Blackwell Optimality: Bellman Optimality Is All You Can Get

Victor Boone

Adrienne Tuynman

131

15 Oct 2025

Q-Learning with Shift-Aware Upper Confidence Bound in Non-Stationary Reinforcement Learning

185

03 Oct 2025

Statistical and Algorithmic Foundations of Reinforcement Learning

249

19 Jul 2025

Model Selection for Average Reward RL with Application to Utility Maximization in Repeated Games

Alireza Masoumian

James R. Wright

543

09 Nov 2024

Learning Infinite-Horizon Average-Reward Linear Mixture MDPs of Bounded SpanInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024

206

19 Oct 2024

How to Shrink Confidence Sets for Many Equivalent Discrete Distributions?

Odalric-Ambrym Maillard

M. S. Talebi

164

22 Jul 2024

Reinforcement Learning and Regret Bounds for Admission ControlInternational Conference on Machine Learning (ICML), 2024

Lucas Weber

A. Busic

Jiamin Zhu

178

07 Jun 2024

Achieving Tractable Minimax Optimal Regret in Average Reward MDPs

Victor Boone

Zihan Zhang

226

03 Jun 2024

Safety through Permissibility: Shield Construction for Fast and Safe Reinforcement Learning

A. Politowicz

Sahisnu Mazumder

Bing-Quan Liu

251

29 May 2024

Finding good policies in average-reward Markov Decision Processes without prior knowledge

Adrienne Tuynman

Rémy Degenne

Emilie Kaufmann

328

27 May 2024

Utilizing Maximum Mean Discrepancy Barycenter for Propagating the Uncertainty of Value Functions in Reinforcement Learning

Srinjoy Roy

Swagatam Das

336

31 Mar 2024

CRIMED: Lower and Upper Bounds on Regret for Bandits with Unbounded Stochastic CorruptionInternational Conference on Algorithmic Learning Theory (ALT), 2023

Shubhada Agrawal

Timothée Mathieu

D. Basu

Odalric-Ambrym Maillard

268

28 Sep 2023

Online Reinforcement Learning in Periodic MDPIEEE Transactions on Artificial Intelligence (IEEE TAI), 2023

Ayush Aniket

Arpan Chattopadhyay

209

16 Mar 2023

Reinforcement Learning in a Birth and Death Process: Breaking the Dependence on the State SpaceNeural Information Processing Systems (NeurIPS), 2023

Jonatha Anselmi

B. Gaujal

Louis-Sébastien Rebuffi

282

21 Feb 2023

An Analysis of Model-Based Reinforcement Learning From Abstracted Observations

252

30 Aug 2022

Online Reinforcement Learning for Periodic MDP

Ayush Aniket

Arpan Chattopadhyay

137

25 Jul 2022

Multiple-Play Stochastic Bandits with Shareable Finite-Capacity ArmsInternational Conference on Machine Learning (ICML), 2022

Xuchuang Wang

Hong Xie

John C. S. Lui

247

17 Jun 2022

Reinforcement Learning for Markovian Bandits: Is Posterior Sampling more Scalable than Optimism?

Nicolas Gast

B. Gaujal

K. Khun

335

16 Jun 2021

UVIP: Model-Free Approach to Evaluate Reinforcement Learning Algorithms

284

05 May 2021

Improved Exploration in Factored Average-Reward MDPsInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2020

M. S. Talebi

Anders Jonsson

Odalric-Ambrym Maillard

254

09 Sep 2020

Statistically Robust, Risk-Averse Best Arm Identification in Multi-Armed BanditsIEEE Transactions on Information Theory (IEEE Trans. Inf. Theory), 2020

Anmol Kagrecha

Jayakrishnan Nair

Krishna Jagannathan

302

28 Aug 2020