ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1511.03722
  4. Cited By
Doubly Robust Off-policy Value Evaluation for Reinforcement Learning

Doubly Robust Off-policy Value Evaluation for Reinforcement Learning

11 November 2015
Nan Jiang
Lihong Li
    OffRL
ArXivPDFHTML

Papers citing "Doubly Robust Off-policy Value Evaluation for Reinforcement Learning"

11 / 11 papers shown
Title
DOLCE: Decomposing Off-Policy Evaluation/Learning into Lagged and Current Effects
DOLCE: Decomposing Off-Policy Evaluation/Learning into Lagged and Current Effects
Shu Tamano
Masanori Nojima
OffRL
131
0
0
02 May 2025
Statistical Inference in Reinforcement Learning: A Selective Survey
Statistical Inference in Reinforcement Learning: A Selective Survey
Chengchun Shi
OffRL
175
2
0
22 Feb 2025
The Best Instruction-Tuning Data are Those That Fit
The Best Instruction-Tuning Data are Those That Fit
Dylan Zhang
Qirun Dai
Hao Peng
ALM
150
6
0
06 Feb 2025
Efficient Policy Evaluation with Safety Constraint for Reinforcement Learning
Efficient Policy Evaluation with Safety Constraint for Reinforcement Learning
Claire Chen
Shuze Liu
Shangtong Zhang
OffRL
295
1
0
08 Oct 2024
Doubly Optimal Policy Evaluation for Reinforcement Learning
Doubly Optimal Policy Evaluation for Reinforcement Learning
Shuze Liu
Claire Chen
Shangtong Zhang
OffRL
124
3
0
03 Oct 2024
Uncertainty Calibration for Counterfactual Propensity Estimation in Recommendation
Uncertainty Calibration for Counterfactual Propensity Estimation in Recommendation
Wenbo Hu
Xin Sun
Qiang liu
Wenbo Hu
Shu Wu
64
0
0
23 Mar 2023
Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible
  Off-Policy Evaluation
Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation
Yuta Saito
Shunsuke Aihara
Megumi Matsutani
Yusuke Narita
OffRL
108
75
0
17 Aug 2020
Off-policy Bandits with Deficient Support
Off-policy Bandits with Deficient Support
Noveen Sachdeva
Yi-Hsun Su
Thorsten Joachims
OffRL
114
75
0
16 Jun 2020
Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
Philip S. Thomas
Emma Brunskill
OffRL
225
573
0
04 Apr 2016
An Emphatic Approach to the Problem of Off-policy Temporal-Difference
  Learning
An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning
R. Sutton
A. R. Mahmood
Martha White
59
269
0
14 Mar 2015
Unbiased Offline Evaluation of Contextual-bandit-based News Article
  Recommendation Algorithms
Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms
Lihong Li
Wei Chu
John Langford
Xuanhui Wang
OffRL
150
574
0
31 Mar 2010
1