Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2007.10229
Cited By
Regret Analysis of a Markov Policy Gradient Algorithm for Multi-arm Bandits
20 July 2020
D. Denisov
N. Walton
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Regret Analysis of a Markov Policy Gradient Algorithm for Multi-arm Bandits"
2 / 2 papers shown
Title
Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates
Jincheng Mei
Bo Dai
Alekh Agarwal
Sharan Vaswani
Anant Raj
Csaba Szepesvári
Dale Schuurmans
87
0
0
11 Feb 2025
The Role of Baselines in Policy Gradient Optimization
Jincheng Mei
Wesley Chung
Valentin Thomas
Bo Dai
Csaba Szepesvári
Dale Schuurmans
24
15
0
16 Jan 2023
1