Concentration of Contractive Stochastic Approximation and Reinforcement
Learning
Stochastic Systems (SS), 2021
Abstract
Using a martingale concentration inequality, concentration bounds `from time on' are derived for stochastic approximation algorithms with contractive maps and both martingale difference and Markov noises. These are applied to reinforcement learning algorithms, in particular to asynchronous Q-learning and TD(0).
View on arXivComments on this paper
