ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.11642
  4. Cited By
Off-Policy Evaluation and Learning for External Validity under a
  Covariate Shift
v1v2v3 (latest)

Off-Policy Evaluation and Learning for External Validity under a Covariate Shift

26 February 2020
Masahiro Kato
Masatoshi Uehara
Shota Yasui
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Off-Policy Evaluation and Learning for External Validity under a Covariate Shift"

12 / 12 papers shown
Title
Doubly Robust Alignment for Large Language Models
Doubly Robust Alignment for Large Language Models
Erhan Xu
Kai Ye
Hongyi Zhou
Luhan Zhu
Francesco Quinzan
Chengchun Shi
24
0
0
01 Jun 2025
Counterfactual Learning with Multioutput Deep Kernels
Counterfactual Learning with Multioutput Deep Kernels
A. Caron
G. Baio
I. Manolopoulou
BDLCMLOffRL
72
1
0
20 Nov 2022
Bayesian Counterfactual Mean Embeddings and Off-Policy Evaluation
Bayesian Counterfactual Mean Embeddings and Off-Policy Evaluation
Diego Martinez-Taboada
Dino Sejdinovic
CMLOffRL
43
0
0
02 Nov 2022
Unified Perspective on Probability Divergence via Maximum Likelihood
  Density Ratio Estimation: Bridging KL-Divergence and Integral Probability
  Metrics
Unified Perspective on Probability Divergence via Maximum Likelihood Density Ratio Estimation: Bridging KL-Divergence and Integral Probability Metrics
Masahiro Kato
Masaaki Imaizumi
Kentaro Minami
64
0
0
31 Jan 2022
Evaluating the Robustness of Off-Policy Evaluation
Evaluating the Robustness of Off-Policy Evaluation
Yuta Saito
Takuma Udagawa
Haruka Kiyohara
Kazuki Mogi
Yusuke Narita
Kei Tateno
ELMOffRL
33
38
0
31 Aug 2021
Combining Online Learning and Offline Learning for Contextual Bandits
  with Deficient Support
Combining Online Learning and Offline Learning for Contextual Bandits with Deficient Support
Hung The Tran
Sunil R. Gupta
Thanh Nguyen-Tang
Santu Rana
Svetha Venkatesh
OffRL
52
5
0
24 Jul 2021
Optimal Uniform OPE and Model-based Offline Reinforcement Learning in
  Time-Homogeneous, Reward-Free and Task-Agnostic Settings
Optimal Uniform OPE and Model-based Offline Reinforcement Learning in Time-Homogeneous, Reward-Free and Task-Agnostic Settings
Ming Yin
Yu Wang
OffRL
97
19
0
13 May 2021
Reliable Off-policy Evaluation for Reinforcement Learning
Reliable Off-policy Evaluation for Reinforcement Learning
Jie Wang
Rui Gao
H. Zha
OffRL
79
11
0
08 Nov 2020
A Practical Guide of Off-Policy Evaluation for Bandit Problems
A Practical Guide of Off-Policy Evaluation for Bandit Problems
Masahiro Kato
Kenshi Abe
Kaito Ariu
Shota Yasui
OffRL
52
3
0
23 Oct 2020
Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible
  Off-Policy Evaluation
Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation
Yuta Saito
Shunsuke Aihara
Megumi Matsutani
Yusuke Narita
OffRL
201
75
0
17 Aug 2020
Non-Negative Bregman Divergence Minimization for Deep Direct Density
  Ratio Estimation
Non-Negative Bregman Divergence Minimization for Deep Direct Density Ratio Estimation
Masahiro Kato
Takeshi Teshima
84
36
0
12 Jun 2020
Counterfactual Mean Embeddings
Counterfactual Mean Embeddings
Krikamol Muandet
Motonobu Kanagawa
Sorawit Saengkyongam
S. Marukatat
CMLOffRL
101
40
0
22 May 2018
1