ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1802.04020
  4. Cited By
Efficient Bias-Span-Constrained Exploration-Exploitation in
  Reinforcement Learning
v1v2 (latest)

Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning

12 February 2018
Ronan Fruit
Matteo Pirotta
A. Lazaric
R. Ortner
ArXiv (abs)PDFHTML

Papers citing "Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning"

50 / 90 papers shown
Q-Learning with Fine-Grained Gap-Dependent Regret
Q-Learning with Fine-Grained Gap-Dependent Regret
Haochen Zhang
Zhong Zheng
Lingzhou Xue
202
1
0
08 Oct 2025
Q-learning with Posterior Sampling
Q-learning with Posterior Sampling
Priyank Agrawal
Shipra Agrawal
Azmat Azati
OffRLGP
367
2
0
01 Jun 2025
Model Selection for Average Reward RL with Application to Utility
  Maximization in Repeated Games
Model Selection for Average Reward RL with Application to Utility Maximization in Repeated Games
Alireza Masoumian
James R. Wright
587
2
0
09 Nov 2024
Provably Adaptive Average Reward Reinforcement Learning for Metric Spaces
Provably Adaptive Average Reward Reinforcement Learning for Metric SpacesConference on Uncertainty in Artificial Intelligence (UAI), 2024
Avik Kar
Rahul Singh
257
1
0
25 Oct 2024
Learning Infinite-Horizon Average-Reward Linear Mixture MDPs of Bounded
  Span
Learning Infinite-Horizon Average-Reward Linear Mixture MDPs of Bounded SpanInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Woojin Chae
Kihyuk Hong
Yufan Zhang
Ambuj Tewari
Dabeen Lee
208
1
0
19 Oct 2024
Gap-Dependent Bounds for Q-Learning using Reference-Advantage Decomposition
Gap-Dependent Bounds for Q-Learning using Reference-Advantage DecompositionInternational Conference on Learning Representations (ICLR), 2024
Zhong Zheng
Haochen Zhang
Lingzhou Xue
OffRL
447
9
0
10 Oct 2024
State-free Reinforcement Learning
State-free Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2024
Mingyu Chen
Aldo Pacchiano
Xuezhou Zhang
368
0
0
27 Sep 2024
Optimistic Q-learning for average reward and episodic reinforcement learning
Optimistic Q-learning for average reward and episodic reinforcement learning
Priyank Agrawal
Shipra Agrawal
473
9
0
18 Jul 2024
Achieving Tractable Minimax Optimal Regret in Average Reward MDPs
Achieving Tractable Minimax Optimal Regret in Average Reward MDPs
Victor Boone
Zihan Zhang
232
10
0
03 Jun 2024
Finding good policies in average-reward Markov Decision Processes
  without prior knowledge
Finding good policies in average-reward Markov Decision Processes without prior knowledge
Adrienne Tuynman
Rémy Degenne
Emilie Kaufmann
347
12
0
27 May 2024
Sample-efficient Learning of Infinite-horizon Average-reward MDPs with
  General Function Approximation
Sample-efficient Learning of Infinite-horizon Average-reward MDPs with General Function Approximation
Jianliang He
Han Zhong
Zhuoran Yang
355
6
0
19 Apr 2024
Utilizing Maximum Mean Discrepancy Barycenter for Propagating the Uncertainty of Value Functions in Reinforcement Learning
Srinjoy Roy
Swagatam Das
358
0
0
31 Mar 2024
Span-Based Optimal Sample Complexity for Weakly Communicating and
  General Average Reward MDPs
Span-Based Optimal Sample Complexity for Weakly Communicating and General Average Reward MDPsNeural Information Processing Systems (NeurIPS), 2024
M. Zurek
Yudong Chen
472
15
0
18 Mar 2024
Dealing with unbounded gradients in stochastic saddle-point optimization
Dealing with unbounded gradients in stochastic saddle-point optimization
Gergely Neu
Nneka Okolo
408
10
0
21 Feb 2024
Quantum Speedups in Regret Analysis of Infinite Horizon Average-Reward Markov Decision Processes
Quantum Speedups in Regret Analysis of Infinite Horizon Average-Reward Markov Decision Processes
Bhargav Ganguly
Yang Xu
Vaneet Aggarwal
585
3
0
18 Oct 2023
Regret Analysis of Policy Gradient Algorithm for Infinite Horizon
  Average Reward Markov Decision Processes
Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision ProcessesAAAI Conference on Artificial Intelligence (AAAI), 2023
Qinbo Bai
Washim Uddin Mondal
Vaneet Aggarwal
404
23
0
05 Sep 2023
On Reward Structures of Markov Decision Processes
On Reward Structures of Markov Decision Processes
Falcon Z. Dai
307
1
0
28 Aug 2023
Learning Optimal Admission Control in Partially Observable Queueing
  Networks
Learning Optimal Admission Control in Partially Observable Queueing Networks
Jonatha Anselmi
B. Gaujal
Louis-Sébastien Rebuffi
182
2
0
04 Aug 2023
Settling the Sample Complexity of Online Reinforcement Learning
Settling the Sample Complexity of Online Reinforcement LearningAnnual Conference Computational Learning Theory (COLT), 2023
Zihan Zhang
Yuxin Chen
Jason D. Lee
S. Du
OffRL
884
42
0
25 Jul 2023
Sharper Model-free Reinforcement Learning for Average-reward Markov
  Decision Processes
Sharper Model-free Reinforcement Learning for Average-reward Markov Decision ProcessesAnnual Conference Computational Learning Theory (COLT), 2023
Zihan Zhang
Qiaomin Xie
OffRL
288
29
0
28 Jun 2023
A Study of Global and Episodic Bonuses for Exploration in Contextual
  MDPs
A Study of Global and Episodic Bonuses for Exploration in Contextual MDPsInternational Conference on Machine Learning (ICML), 2023
Mikael Henaff
Minqi Jiang
Roberta Raileanu
235
17
0
05 Jun 2023
Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy
  Actor-Critic
Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-CriticInternational Conference on Machine Learning (ICML), 2023
Tianying Ji
Yuping Luo
Gang Hua
Xianyuan Zhan
Jianwei Zhang
Huazhe Xu
OffRLOnRL
489
23
0
05 Jun 2023
Reinforcement Learning with Delayed, Composite, and Partially Anonymous
  Reward
Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward
Washim Uddin Mondal
Vaneet Aggarwal
291
3
0
04 May 2023
Variance-aware robust reinforcement learning with linear function
  approximation under heavy-tailed rewards
Variance-aware robust reinforcement learning with linear function approximation under heavy-tailed rewards
Xiang Li
Qiang Sun
307
10
0
09 Mar 2023
Optimistic Planning by Regularized Dynamic Programming
Optimistic Planning by Regularized Dynamic ProgrammingInternational Conference on Machine Learning (ICML), 2023
Antoine Moulin
Gergely Neu
492
8
0
27 Feb 2023
Reinforcement Learning in a Birth and Death Process: Breaking the
  Dependence on the State Space
Reinforcement Learning in a Birth and Death Process: Breaking the Dependence on the State SpaceNeural Information Processing Systems (NeurIPS), 2023
Jonatha Anselmi
B. Gaujal
Louis-Sébastien Rebuffi
285
3
0
21 Feb 2023
Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both
  Worlds in Stochastic and Deterministic Environments
Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic EnvironmentsInternational Conference on Machine Learning (ICML), 2023
Runlong Zhou
Zihan Zhang
S. Du
406
19
0
31 Jan 2023
Improved Regret for Efficient Online Reinforcement Learning with Linear
  Function Approximation
Improved Regret for Efficient Online Reinforcement Learning with Linear Function ApproximationInternational Conference on Machine Learning (ICML), 2023
Uri Sherman
Tomer Koren
Yishay Mansour
379
14
0
30 Jan 2023
Horizon-Free and Variance-Dependent Reinforcement Learning for Latent
  Markov Decision Processes
Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision ProcessesInternational Conference on Machine Learning (ICML), 2022
Runlong Zhou
Ruosong Wang
S. Du
419
3
0
20 Oct 2022
Unpacking Reward Shaping: Understanding the Benefits of Reward
  Engineering on Sample Complexity
Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample ComplexityNeural Information Processing Systems (NeurIPS), 2022
Abhishek Gupta
Aldo Pacchiano
Yuexiang Zhai
Sham Kakade
Sergey Levine
OffRL
267
100
0
18 Oct 2022
Optimistic Posterior Sampling for Reinforcement Learning with Few
  Samples and Tight Guarantees
Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight GuaranteesNeural Information Processing Systems (NeurIPS), 2022
D. Tiapkin
Denis Belomestny
Daniele Calandriello
Eric Moulines
Rémi Munos
A. Naumov
Mark Rowland
Michal Valko
Pierre Menard
267
12
0
28 Sep 2022
An Analysis of Model-Based Reinforcement Learning From Abstracted
  Observations
An Analysis of Model-Based Reinforcement Learning From Abstracted Observations
Rolf A. N. Starre
Marco Loog
E. Congeduti
F. Oliehoek
OffRL
262
3
0
30 Aug 2022
Logarithmic regret bounds for continuous-time average-reward Markov
  decision processes
Logarithmic regret bounds for continuous-time average-reward Markov decision processesSIAM Journal of Control and Optimization (SICON), 2022
Ningyuan Chen
X. Zhou
368
9
0
23 May 2022
Slowly Changing Adversarial Bandit Algorithms are Efficient for
  Discounted MDPs
Slowly Changing Adversarial Bandit Algorithms are Efficient for Discounted MDPsInternational Conference on Algorithmic Learning Theory (ALT), 2022
Ian A. Kash
L. Reyzin
Zishun Yu
476
1
0
18 May 2022
From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses
From Dirichlet to Rubin: Optimistic Exploration in RL without BonusesInternational Conference on Machine Learning (ICML), 2022
D. Tiapkin
Denis Belomestny
Eric Moulines
A. Naumov
S. Samsonov
Yunhao Tang
Michal Valko
Pierre Menard
315
23
0
16 May 2022
On learning Whittle index policy for restless bandits with scalable
  regret
On learning Whittle index policy for restless bandits with scalable regretIEEE Transactions on Control of Network Systems (IEEE TCNS), 2022
N. Akbarzadeh
Aditya Mahajan
339
15
0
07 Feb 2022
Learning Infinite-Horizon Average-Reward Markov Decision Processes with
  Constraints
Learning Infinite-Horizon Average-Reward Markov Decision Processes with ConstraintsInternational Conference on Machine Learning (ICML), 2022
Liyu Chen
R. Jain
Haipeng Luo
325
33
0
31 Jan 2022
Bad-Policy Density: A Measure of Reinforcement Learning Hardness
Bad-Policy Density: A Measure of Reinforcement Learning Hardness
David Abel
Cameron Allen
Dilip Arumugam
D Ellis Hershkowitz
Michael L. Littman
Lawson L. S. Wong
271
3
0
07 Oct 2021
Understanding Domain Randomization for Sim-to-real Transfer
Understanding Domain Randomization for Sim-to-real Transfer
Xiaoyu Chen
Jiachen Hu
Chi Jin
Lihong Li
Liwei Wang
470
164
0
07 Oct 2021
Concave Utility Reinforcement Learning with Zero-Constraint Violations
Concave Utility Reinforcement Learning with Zero-Constraint Violations
Mridul Agarwal
Qinbo Bai
Vaneet Aggarwal
471
17
0
12 Sep 2021
A Survey of Exploration Methods in Reinforcement Learning
A Survey of Exploration Methods in Reinforcement Learning
Susan Amin
Maziar Gomrokchi
Harsh Satija
H. V. Hoof
Doina Precup
OffRL
408
106
0
01 Sep 2021
Policy Optimization in Adversarial MDPs: Improved Exploration via
  Dilated Bonuses
Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated BonusesNeural Information Processing Systems (NeurIPS), 2021
Haipeng Luo
Chen-Yu Wei
Chung-Wei Lee
318
52
0
18 Jul 2021
Reinforcement Learning for Markovian Bandits: Is Posterior Sampling more
  Scalable than Optimism?
Reinforcement Learning for Markovian Bandits: Is Posterior Sampling more Scalable than Optimism?
Nicolas Gast
B. Gaujal
K. Khun
356
2
0
16 Jun 2021
Towards Tight Bounds on the Sample Complexity of Average-reward MDPs
Towards Tight Bounds on the Sample Complexity of Average-reward MDPsInternational Conference on Machine Learning (ICML), 2021
Yujia Jin
Aaron Sidford
153
45
0
13 Jun 2021
Online Learning for Stochastic Shortest Path Model via Posterior
  Sampling
Online Learning for Stochastic Shortest Path Model via Posterior Sampling
Mehdi Jafarnia-Jahromi
Liyu Chen
Rahul Jain
Haipeng Luo
OffRL
297
18
0
09 Jun 2021
Stochastic Shortest Path: Minimax, Parameter-Free and Towards
  Horizon-Free Regret
Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free RegretNeural Information Processing Systems (NeurIPS), 2021
Jean Tarbouriech
Runlong Zhou
S. Du
Matteo Pirotta
M. Valko
A. Lazaric
271
38
0
22 Apr 2021
Minimax Regret for Stochastic Shortest Path
Minimax Regret for Stochastic Shortest PathNeural Information Processing Systems (NeurIPS), 2021
Alon Cohen
Yonathan Efroni
Yishay Mansour
Aviv A. Rosenberg
416
31
0
24 Mar 2021
UCB Momentum Q-learning: Correcting the bias without forgetting
UCB Momentum Q-learning: Correcting the bias without forgettingInternational Conference on Machine Learning (ICML), 2021
Pierre Menard
O. D. Domingues
Xuedong Shang
Michal Valko
383
51
0
01 Mar 2021
Online Learning for Unknown Partially Observable MDPs
Online Learning for Unknown Partially Observable MDPsInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2021
Mehdi Jafarnia-Jahromi
Rahul Jain
A. Nayyar
330
24
0
25 Feb 2021
Improved Regret Bound and Experience Replay in Regularized Policy
  Iteration
Improved Regret Bound and Experience Replay in Regularized Policy IterationInternational Conference on Machine Learning (ICML), 2021
N. Lazić
Dong Yin
Yasin Abbasi-Yadkori
Csaba Szepesvári
OffRL
158
20
0
25 Feb 2021
12
Next
Page 1 of 2