ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.03393
  4. Cited By
Towards Optimal Off-Policy Evaluation for Reinforcement Learning with
  Marginalized Importance Sampling
v1v2v3v4 (latest)

Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling

Neural Information Processing Systems (NeurIPS), 2019
8 June 2019
Tengyang Xie
Yifei Ma
Yu Wang
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling"

50 / 130 papers shown
BAMAS: Structuring Budget-Aware Multi-Agent Systems
BAMAS: Structuring Budget-Aware Multi-Agent Systems
Liming Yang
Junyu Luo
Xuanzhe Liu
Yiling Lou
Zhenpeng Chen
LLMAG
413
0
0
26 Nov 2025
Offline Reinforcement Learning in Large State Spaces: Algorithms and Guarantees
Offline Reinforcement Learning in Large State Spaces: Algorithms and Guarantees
Nan Jiang
Tengyang Xie
OffRL
243
16
0
05 Oct 2025
Nested-ReFT: Efficient Reinforcement Learning for Large Language Model Fine-Tuning via Off-Policy Rollouts
Nested-ReFT: Efficient Reinforcement Learning for Large Language Model Fine-Tuning via Off-Policy Rollouts
M. Heuillet
Yufei Cui
Boxing Chen
Audrey Durand
Prasanna Parthasarathi
OffRLReLMLRM
428
0
1
13 Aug 2025
Demystifying the Paradox of Importance Sampling with an Estimated History-Dependent Behavior Policy in Off-Policy Evaluation
Demystifying the Paradox of Importance Sampling with an Estimated History-Dependent Behavior Policy in Off-Policy Evaluation
Hongyi Zhou
Josiah P. Hanna
Jin Zhu
Ying Yang
Chengchun Shi
OffRL
276
4
0
28 May 2025
STITCH-OPE: Trajectory Stitching with Guided Diffusion for Off-Policy Evaluation
STITCH-OPE: Trajectory Stitching with Guided Diffusion for Off-Policy Evaluation
Hossein Goli
Michael Gimelfarb
Nathan Samuel de Lara
Haruki Nishimura
Masha Itkina
Florian Shkurti
OffRL
395
2
0
27 May 2025
DOLCE: Decomposing Off-Policy Evaluation/Learning into Lagged and Current Effects
DOLCE: Decomposing Off-Policy Evaluation/Learning into Lagged and Current Effects
Shu Tamano
OffRL
513
0
0
02 May 2025
Reinforcement Learning with Continuous Actions Under Unmeasured Confounding
Reinforcement Learning with Continuous Actions Under Unmeasured Confounding
Yuhan Li
Eugene Han
Yifan Hu
Wenzhuo Zhou
Zhengling Qi
Yifan Cui
Ruoqing Zhu
OffRL
1.0K
2
0
01 May 2025
When Machine Learning Meets Importance Sampling: A More Efficient Rare Event Estimation Approach
When Machine Learning Meets Importance Sampling: A More Efficient Rare Event Estimation Approach
Ruoning Zhao
Xinyun Chen
163
0
0
18 Apr 2025
Statistical Tractability of Off-policy Evaluation of History-dependent Policies in POMDPs
Statistical Tractability of Off-policy Evaluation of History-dependent Policies in POMDPsInternational Conference on Learning Representations (ICLR), 2025
Yuheng Zhang
Nan Jiang
OffRL
305
5
0
03 Mar 2025
Reweighting Improves Conditional Risk Bounds
Reweighting Improves Conditional Risk Bounds
Yikai Zhang
Jiahe Lin
Fengpei Li
Songzhu Zheng
Anant Raj
Anderson Schneider
Yuriy Nevmyvaka
219
1
0
04 Jan 2025
Two-way Deconfounder for Off-policy Evaluation in Causal Reinforcement
  Learning
Two-way Deconfounder for Off-policy Evaluation in Causal Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2024
Shuguang Yu
Shuxing Fang
Ruixin Peng
Zhengling Qi
Fan Zhou
C. Shi
CMLOffRL
387
8
0
08 Dec 2024
Concept-driven Off Policy Evaluation
Concept-driven Off Policy Evaluation
Ritam Majumdar
Jack Teversham
Sonali Parbhoo
OffRL
355
0
0
28 Nov 2024
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Xinyan Guan
Yanjiang Liu
Xinyu Lu
Boxi Cao
Xianpei Han
...
Le Sun
Jie Lou
Bowen Yu
Yaojie Lu
Hongyu Lin
ALM
665
10
0
18 Nov 2024
Scalable Offline Reinforcement Learning for Mean Field Games
Scalable Offline Reinforcement Learning for Mean Field Games
Axel Brunnbauer
Julian Lemmel
Z. Babaiee
Sophie A. Neubauer
Radu Grosu
OffRL
283
0
0
23 Oct 2024
Primal-Dual Spectral Representation for Off-policy Evaluation
Primal-Dual Spectral Representation for Off-policy EvaluationInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Yang Hu
Tianyi Chen
Na Li
Kai Wang
Bo Dai
OffRL
339
4
0
23 Oct 2024
The Role of Inherent Bellman Error in Offline Reinforcement Learning
  with Linear Function Approximation
The Role of Inherent Bellman Error in Offline Reinforcement Learning with Linear Function Approximation
Noah Golowich
Ankur Moitra
OffRL
365
3
0
17 Jun 2024
A Fine-grained Analysis of Fitted Q-evaluation: Beyond Parametric Models
A Fine-grained Analysis of Fitted Q-evaluation: Beyond Parametric ModelsInternational Conference on Machine Learning (ICML), 2024
Jiayi Wang
Zhengling Qi
Raymond K. W. Wong
214
0
0
14 Jun 2024
RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy
  Evaluation
RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation
Jeongyeol Kwon
Shie Mannor
Constantine Caramanis
Yonathan Efroni
OffRL
451
6
0
03 Jun 2024
Kernel Metric Learning for In-Sample Off-Policy Evaluation of
  Deterministic RL Policies
Kernel Metric Learning for In-Sample Off-Policy Evaluation of Deterministic RL Policies
Haanvid Lee
Tri Wahyu Guntara
Jongmin Lee
Yung-Kyun Noh
Kee-Eung Kim
OffRL
295
3
0
29 May 2024
Multiple-policy Evaluation via Density Estimation
Multiple-policy Evaluation via Density Estimation
Yilei Chen
Aldo Pacchiano
I. Paschalidis
OffRL
435
1
0
29 Mar 2024
On the Curses of Future and History in Future-dependent Value Functions
  for Off-policy Evaluation
On the Curses of Future and History in Future-dependent Value Functions for Off-policy Evaluation
Yuheng Zhang
Nan Jiang
OffRL
328
7
0
22 Feb 2024
Offline Multi-task Transfer RL with Representational Penalization
Offline Multi-task Transfer RL with Representational Penalization
Avinandan Bose
S. S. Du
Maryam Fazel
OffRL
383
13
0
19 Feb 2024
Off-Policy Evaluation in Markov Decision Processes under Weak Distributional Overlap
Off-Policy Evaluation in Markov Decision Processes under Weak Distributional Overlap
Mohammad Mehrabi
Stefan Wager
OffRL
364
18
0
13 Feb 2024
POTEC: Off-Policy Learning for Large Action Spaces via Two-Stage Policy
  Decomposition
POTEC: Off-Policy Learning for Large Action Spaces via Two-Stage Policy Decomposition
Yuta Saito
Jihan Yao
Thorsten Joachims
OffRL
345
14
0
09 Feb 2024
Probabilistic Offline Policy Ranking with Approximate Bayesian
  Computation
Probabilistic Offline Policy Ranking with Approximate Bayesian Computation
Longchao Da
Porter Jenkins
Trevor Schwantes
Jeffrey Dotson
Hua Wei
OffRL
246
3
0
17 Dec 2023
Marginal Density Ratio for Off-Policy Evaluation in Contextual Bandits
Marginal Density Ratio for Off-Policy Evaluation in Contextual BanditsNeural Information Processing Systems (NeurIPS), 2023
Muhammad Faaiz Taufiq
Arnaud Doucet
Rob Cornish
Jean-François Ton
OffRL
356
10
0
03 Dec 2023
Robust Offline Reinforcement learning with Heavy-Tailed Rewards
Robust Offline Reinforcement learning with Heavy-Tailed RewardsInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2023
Jin Zhu
Runzhe Wan
Zhengling Qi
Shuang Luo
C. Shi
OffRL
399
5
0
28 Oct 2023
Counterfactual-Augmented Importance Sampling for Semi-Offline Policy
  Evaluation
Counterfactual-Augmented Importance Sampling for Semi-Offline Policy EvaluationNeural Information Processing Systems (NeurIPS), 2023
Shengpu Tang
Jenna Wiens
OffRLCML
304
6
0
26 Oct 2023
Sample Complexity of Preference-Based Nonparametric Off-Policy
  Evaluation with Deep Networks
Sample Complexity of Preference-Based Nonparametric Off-Policy Evaluation with Deep Networks
Zihao Li
Xiang Ji
Minshuo Chen
Mengdi Wang
OffRL
313
0
0
16 Oct 2023
When is Agnostic Reinforcement Learning Statistically Tractable?
When is Agnostic Reinforcement Learning Statistically Tractable?Neural Information Processing Systems (NeurIPS), 2023
Zeyu Jia
Gene Li
Alexander Rakhlin
Ayush Sekhari
Nathan Srebro
OffRL
380
7
0
09 Oct 2023
Stackelberg Batch Policy Learning
Stackelberg Batch Policy Learning
Wenzhuo Zhou
Annie Qu
OffRL
340
1
0
28 Sep 2023
Distributional Shift-Aware Off-Policy Interval Estimation: A Unified
  Error Quantification Framework
Distributional Shift-Aware Off-Policy Interval Estimation: A Unified Error Quantification Framework
Wenzhuo Zhou
Yuhan Li
Ruoqing Zhu
Annie Qu
OffRL
328
7
0
23 Sep 2023
A new Gradient TD Algorithm with only One Step-size: Convergence Rate
  Analysis using $L$-$λ$ Smoothness
A new Gradient TD Algorithm with only One Step-size: Convergence Rate Analysis using LLL-λλλ Smoothness
Hengshuai Yao
348
3
0
29 Jul 2023
Offline Policy Evaluation for Reinforcement Learning with Adaptively
  Collected Data
Offline Policy Evaluation for Reinforcement Learning with Adaptively Collected Data
Sunil Madhow
Dan Xiao
Ming Yin
Yu-Xiang Wang
OffRL
320
0
0
24 Jun 2023
High-probability sample complexities for policy evaluation with linear
  function approximation
High-probability sample complexities for policy evaluation with linear function approximationIEEE Transactions on Information Theory (IEEE Trans. Inf. Theory), 2023
Gen Li
Weichen Wu
Yuejie Chi
Cong Ma
Alessandro Rinaldo
Yuting Wei
OffRL
459
10
0
30 May 2023
Off-Policy Evaluation for Large Action Spaces via Conjunct Effect
  Modeling
Off-Policy Evaluation for Large Action Spaces via Conjunct Effect ModelingInternational Conference on Machine Learning (ICML), 2023
Yuta Saito
Qingyang Ren
Thorsten Joachims
CMLOffRL
358
33
0
14 May 2023
On the Sample Complexity of Vanilla Model-Based Offline Reinforcement
  Learning with Dependent Samples
On the Sample Complexity of Vanilla Model-Based Offline Reinforcement Learning with Dependent SamplesAAAI Conference on Artificial Intelligence (AAAI), 2023
Mustafa O. Karabag
Ufuk Topcu
OffRL
339
6
0
07 Mar 2023
HOPE: Human-Centric Off-Policy Evaluation for E-Learning and Healthcare
HOPE: Human-Centric Off-Policy Evaluation for E-Learning and HealthcareAdaptive Agents and Multi-Agent Systems (AAMAS), 2023
Ge Gao
Song Ju
Markel Sanz Ausin
Min Chi
OffRL
255
8
0
18 Feb 2023
Revisiting Bellman Errors for Offline Model Selection
Revisiting Bellman Errors for Offline Model SelectionInternational Conference on Machine Learning (ICML), 2023
Joshua P. Zitovsky
Daniel de Marchi
Rishabh Agarwal
Michael R. Kosorok University of North Carolina at Chapel Hill
OffRL
345
6
0
31 Jan 2023
Efficient Policy Evaluation with Offline Data Informed Behavior Policy
  Design
Efficient Policy Evaluation with Offline Data Informed Behavior Policy DesignInternational Conference on Machine Learning (ICML), 2023
Shuze Liu
Shangtong Zhang
OffRL
515
7
0
31 Jan 2023
A Reinforcement Learning Framework for Dynamic Mediation Analysis
A Reinforcement Learning Framework for Dynamic Mediation AnalysisInternational Conference on Machine Learning (ICML), 2023
Linjuan Ge
Jitao Wang
C. Shi
Zhanghua Wu
Rui Song
400
6
0
31 Jan 2023
Importance Weighted Actor-Critic for Optimal Conservative Offline
  Reinforcement Learning
Importance Weighted Actor-Critic for Optimal Conservative Offline Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2023
Hanlin Zhu
Paria Rashidinejad
Jiantao Jiao
OffRL
566
20
0
30 Jan 2023
Model-based Offline Reinforcement Learning with Local Misspecification
Model-based Offline Reinforcement Learning with Local MisspecificationAAAI Conference on Artificial Intelligence (AAAI), 2023
Kefan Dong
Yannis Flet-Berliac
Allen Nie
Emma Brunskill
OffRL
265
6
0
26 Jan 2023
Off-Policy Evaluation for Action-Dependent Non-Stationary Environments
Off-Policy Evaluation for Action-Dependent Non-Stationary EnvironmentsNeural Information Processing Systems (NeurIPS), 2023
Yash Chandak
Shiv Shankar
Nathaniel D. Bastian
Bruno Castro da Silva
Emma Brunskil
Philip S. Thomas
OffRL
272
6
0
24 Jan 2023
Minimax Weight Learning for Absorbing MDPs
Minimax Weight Learning for Absorbing MDPsStatistical Papers (Stat. Pap.), 2023
Fengyin Li
Yuqiang Li
Xianyi Wu
OffRL
171
1
0
09 Jan 2023
An Instrumental Variable Approach to Confounded Off-Policy Evaluation
An Instrumental Variable Approach to Confounded Off-Policy EvaluationInternational Conference on Machine Learning (ICML), 2022
Yang Xu
Jin Zhu
C. Shi
Shuang Luo
R. Song
OffRL
365
24
0
29 Dec 2022
Policy learning "without" overlap: Pessimism and generalized empirical Bernstein's inequality
Policy learning "without" overlap: Pessimism and generalized empirical Bernstein's inequality
Ying Jin
Zhimei Ren
Zhuoran Yang
Zhaoran Wang
OffRL
592
31
0
19 Dec 2022
Scaling Marginalized Importance Sampling to High-Dimensional
  State-Spaces via State Abstraction
Scaling Marginalized Importance Sampling to High-Dimensional State-Spaces via State AbstractionAAAI Conference on Artificial Intelligence (AAAI), 2022
Brahma S. Pavse
Josiah P. Hanna
OffRL
228
9
0
14 Dec 2022
A Review of Off-Policy Evaluation in Reinforcement Learning
A Review of Off-Policy Evaluation in Reinforcement Learning
Masatoshi Uehara
C. Shi
Nathan Kallus
OffRL
303
114
0
13 Dec 2022
Low Variance Off-policy Evaluation with State-based Importance Sampling
Low Variance Off-policy Evaluation with State-based Importance SamplingConference on Algebraic Informatics (AI), 2022
David M. Bossens
Philip S. Thomas
OffRL
538
5
0
07 Dec 2022
123
Next
Page 1 of 3