A Unified Framework for Alternating Offline Model Training and Policy
LearningNeural Information Processing Systems (NeurIPS), 2022 |
Non-asymptotic Confidence Intervals of Off-policy Evaluation: Primal and
Dual BoundsInternational Conference on Learning Representations (ICLR), 2021 |