DOPL: Direct Online Preference Learning for Restless Bandits with Preference FeedbackInternational Conference on Learning Representations (ICLR), 2024 |
Bayesian Learning of Optimal Policies in Markov Decision Processes with
Countably Infinite State-SpaceNeural Information Processing Systems (NeurIPS), 2023 |