v1v2 (latest)

Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning

12 February 2018

Papers citing "Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning"

50 / 90 papers shown

Q-Learning with Fine-Grained Gap-Dependent Regret

Haochen Zhang

Zhong Zheng

Lingzhou Xue

202

08 Oct 2025

Q-learning with Posterior Sampling

367

01 Jun 2025

Model Selection for Average Reward RL with Application to Utility Maximization in Repeated Games

Alireza Masoumian

James R. Wright

587

09 Nov 2024

Provably Adaptive Average Reward Reinforcement Learning for Metric SpacesConference on Uncertainty in Artificial Intelligence (UAI), 2024

Avik Kar

Rahul Singh

257

25 Oct 2024

Learning Infinite-Horizon Average-Reward Linear Mixture MDPs of Bounded SpanInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024

208

19 Oct 2024

Gap-Dependent Bounds for Q-Learning using Reference-Advantage DecompositionInternational Conference on Learning Representations (ICLR), 2024

447

10 Oct 2024

State-free Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2024

Mingyu Chen

Aldo Pacchiano

Xuezhou Zhang

368

27 Sep 2024

Optimistic Q-learning for average reward and episodic reinforcement learning

Priyank Agrawal

Shipra Agrawal

473

18 Jul 2024

Achieving Tractable Minimax Optimal Regret in Average Reward MDPs

Victor Boone

Zihan Zhang

232

03 Jun 2024

Finding good policies in average-reward Markov Decision Processes without prior knowledge

Adrienne Tuynman

Rémy Degenne

Emilie Kaufmann

347

27 May 2024

Sample-efficient Learning of Infinite-horizon Average-reward MDPs with General Function Approximation

Jianliang He

Han Zhong

Zhuoran Yang

355

19 Apr 2024

Utilizing Maximum Mean Discrepancy Barycenter for Propagating the Uncertainty of Value Functions in Reinforcement Learning

Srinjoy Roy

Swagatam Das

358

31 Mar 2024

Span-Based Optimal Sample Complexity for Weakly Communicating and General Average Reward MDPsNeural Information Processing Systems (NeurIPS), 2024

M. Zurek

Yudong Chen

472

18 Mar 2024

Dealing with unbounded gradients in stochastic saddle-point optimization

Gergely Neu

Nneka Okolo

408

21 Feb 2024

Quantum Speedups in Regret Analysis of Infinite Horizon Average-Reward Markov Decision Processes

Bhargav Ganguly

Yang Xu

Vaneet Aggarwal

585

18 Oct 2023

Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision ProcessesAAAI Conference on Artificial Intelligence (AAAI), 2023

Qinbo Bai

Washim Uddin Mondal

Vaneet Aggarwal

404

05 Sep 2023

On Reward Structures of Markov Decision Processes

Falcon Z. Dai

307

28 Aug 2023

Learning Optimal Admission Control in Partially Observable Queueing Networks

Jonatha Anselmi

B. Gaujal

Louis-Sébastien Rebuffi

182

04 Aug 2023

Settling the Sample Complexity of Online Reinforcement LearningAnnual Conference Computational Learning Theory (COLT), 2023

884

25 Jul 2023

Sharper Model-free Reinforcement Learning for Average-reward Markov Decision ProcessesAnnual Conference Computational Learning Theory (COLT), 2023

Zihan Zhang

Qiaomin Xie

OffRL

288

28 Jun 2023

A Study of Global and Episodic Bonuses for Exploration in Contextual MDPsInternational Conference on Machine Learning (ICML), 2023

Mikael Henaff

Minqi Jiang

Roberta Raileanu

235

05 Jun 2023

Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-CriticInternational Conference on Machine Learning (ICML), 2023

489

05 Jun 2023

Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward

Washim Uddin Mondal

Vaneet Aggarwal

291

04 May 2023

Variance-aware robust reinforcement learning with linear function approximation under heavy-tailed rewards

Xiang Li

Qiang Sun

307

09 Mar 2023

Optimistic Planning by Regularized Dynamic ProgrammingInternational Conference on Machine Learning (ICML), 2023

Antoine Moulin

Gergely Neu

492

27 Feb 2023

Reinforcement Learning in a Birth and Death Process: Breaking the Dependence on the State SpaceNeural Information Processing Systems (NeurIPS), 2023

Jonatha Anselmi

B. Gaujal

Louis-Sébastien Rebuffi

285

21 Feb 2023

Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic EnvironmentsInternational Conference on Machine Learning (ICML), 2023

Runlong Zhou

Zihan Zhang

S. Du

406

31 Jan 2023

Improved Regret for Efficient Online Reinforcement Learning with Linear Function ApproximationInternational Conference on Machine Learning (ICML), 2023

Uri Sherman

Tomer Koren

Yishay Mansour

379

30 Jan 2023

Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision ProcessesInternational Conference on Machine Learning (ICML), 2022

Runlong Zhou

Ruosong Wang

S. Du

419

20 Oct 2022

Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample ComplexityNeural Information Processing Systems (NeurIPS), 2022

Abhishek Gupta

267

100

18 Oct 2022

Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight GuaranteesNeural Information Processing Systems (NeurIPS), 2022

Daniele Calandriello

Pierre Menard

267

28 Sep 2022

An Analysis of Model-Based Reinforcement Learning From Abstracted Observations

262

30 Aug 2022

Logarithmic regret bounds for continuous-time average-reward Markov decision processesSIAM Journal of Control and Optimization (SICON), 2022

Ningyuan Chen

X. Zhou

368

23 May 2022

Slowly Changing Adversarial Bandit Algorithms are Efficient for Discounted MDPsInternational Conference on Algorithmic Learning Theory (ALT), 2022

Ian A. Kash

L. Reyzin

Zishun Yu

476

18 May 2022

From Dirichlet to Rubin: Optimistic Exploration in RL without BonusesInternational Conference on Machine Learning (ICML), 2022

Pierre Menard

315

16 May 2022

On learning Whittle index policy for restless bandits with scalable regretIEEE Transactions on Control of Network Systems (IEEE TCNS), 2022

N. Akbarzadeh

Aditya Mahajan

339

07 Feb 2022

Learning Infinite-Horizon Average-Reward Markov Decision Processes with ConstraintsInternational Conference on Machine Learning (ICML), 2022

Liyu Chen

R. Jain

Haipeng Luo

325

31 Jan 2022

Bad-Policy Density: A Measure of Reinforcement Learning Hardness

Cameron Allen

Michael L. Littman

271

07 Oct 2021

Understanding Domain Randomization for Sim-to-real Transfer

470

164

07 Oct 2021

Concave Utility Reinforcement Learning with Zero-Constraint Violations

Mridul Agarwal

Qinbo Bai

Vaneet Aggarwal

471

12 Sep 2021

A Survey of Exploration Methods in Reinforcement Learning

408

106

01 Sep 2021

Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated BonusesNeural Information Processing Systems (NeurIPS), 2021

Haipeng Luo

Chen-Yu Wei

Chung-Wei Lee

318

18 Jul 2021

Reinforcement Learning for Markovian Bandits: Is Posterior Sampling more Scalable than Optimism?

Nicolas Gast

B. Gaujal

K. Khun

356

16 Jun 2021

Towards Tight Bounds on the Sample Complexity of Average-reward MDPsInternational Conference on Machine Learning (ICML), 2021

Yujia Jin

Aaron Sidford

153

13 Jun 2021

Online Learning for Stochastic Shortest Path Model via Posterior Sampling

Mehdi Jafarnia-Jahromi

297

09 Jun 2021

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free RegretNeural Information Processing Systems (NeurIPS), 2021

Runlong Zhou

271

22 Apr 2021

Minimax Regret for Stochastic Shortest PathNeural Information Processing Systems (NeurIPS), 2021

416

24 Mar 2021

UCB Momentum Q-learning: Correcting the bias without forgettingInternational Conference on Machine Learning (ICML), 2021

Pierre Menard

O. D. Domingues

Xuedong Shang

Michal Valko

383

01 Mar 2021

Online Learning for Unknown Partially Observable MDPsInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2021

Mehdi Jafarnia-Jahromi

Rahul Jain

A. Nayyar

330

25 Feb 2021

Improved Regret Bound and Experience Replay in Regularized Policy IterationInternational Conference on Machine Learning (ICML), 2021

158

25 Feb 2021