Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution MismatchJournal of machine learning research (JMLR), 2021 |
Doubly Robust Off-Policy Actor-Critic: Convergence and OptimalityInternational Conference on Machine Learning (ICML), 2021 |