Model-free Reinforcement Learning in Infinite-horizon Average-reward
Markov Decision ProcessesInternational Conference on Machine Learning (ICML), 2019 |
Bandit Convex Optimization in Non-stationary EnvironmentsInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2019 |
Bandits with Feedback Graphs and Switching CostsNeural Information Processing Systems (NeurIPS), 2019 |
Model selection for contextual banditsNeural Information Processing Systems (NeurIPS), 2019 |
Equipping Experts/Bandits with Long-term MemoryNeural Information Processing Systems (NeurIPS), 2019 |
OSOM: A simultaneously optimal algorithm for multi-armed and linear
contextual banditsInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2019 |
Hedging the Drift: Learning to Optimize under Non-StationarityManagement Sciences (MS), 2019 |
Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial BanditsJournal of machine learning research (JMLR), 2018 |