AlphaSnake: Policy Iteration on a Nondeterministic NP-hard Markov
Decision ProcessAAAI Conference on Artificial Intelligence (AAAI), 2022 |
A Theoretical Understanding of Gradient Bias in Meta-Reinforcement
LearningNeural Information Processing Systems (NeurIPS), 2021 |