ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.08021
88
0

Model Selection for Off-policy Evaluation: New Algorithms and Experimental Protocol

11 February 2025
Pai Liu
Lingfeng Zhao
Shivangi Agarwal
Jinghan Liu
Audrey Huang
P. Amortila
Nan Jiang
    OODD
    OffRL
ArXivPDFHTML
Abstract

Holdout validation and hyperparameter tuning from data is a long-standing problem in offline reinforcement learning (RL). A standard framework is to use off-policy evaluation (OPE) methods to evaluate and select the policies, but OPE either incurs exponential variance (e.g., importance sampling) or has hyperparameters on their own (e.g., FQE and model-based). In this work we focus on hyperparameter tuning for OPE itself, which is even more under-investigated. Concretely, we select among candidate value functions ("model-free") or dynamics ("model-based") to best assess the performance of a target policy. Our contributions are two fold. We develop: (1) new model-free and model-based selectors with theoretical guarantees, and (2) a new experimental protocol for empirically evaluating them. Compared to the model-free protocol in prior works, our new protocol allows for more stable generation of candidate value functions, better control of misspecification, and evaluation of model-free and model-based methods alike. We exemplify the protocol on a Gym environment, and find that our new model-free selector, LSTD-Tournament, demonstrates promising empirical performance.

View on arXiv
@article{liu2025_2502.08021,
  title={ Model Selection for Off-policy Evaluation: New Algorithms and Experimental Protocol },
  author={ Pai Liu and Lingfeng Zhao and Shivangi Agarwal and Jinghan Liu and Audrey Huang and Philip Amortila and Nan Jiang },
  journal={arXiv preprint arXiv:2502.08021},
  year={ 2025 }
}
Comments on this paper