Last-Iterate Convergent Policy Gradient Primal-Dual Methods for
Constrained MDPsNeural Information Processing Systems (NeurIPS), 2023 |
Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games
with Bandit FeedbackNeural Information Processing Systems (NeurIPS), 2023 |