ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.09907
  4. Cited By
Instrumental Variable Value Iteration for Causal Offline Reinforcement
  Learning
v1v2 (latest)

Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning

19 February 2021
Luofeng Liao
Zuyue Fu
Zhuoran Yang
Yixin Wang
Mladen Kolar
Zhaoran Wang
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning"

30 / 30 papers shown
The Sample Complexity of Online Strategic Decision Making with Information Asymmetry and Knowledge Transportability
The Sample Complexity of Online Strategic Decision Making with Information Asymmetry and Knowledge Transportability
Jiachen Hu
Rui Ai
Han Zhong
Xiaoyu Chen
L. Wang
Zhaoran Wang
Zhuoran Yang
249
0
0
11 Jun 2025
Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression
Optimality and Adaptivity of Deep Neural Features for Instrumental Variable RegressionInternational Conference on Learning Representations (ICLR), 2025
Juno Kim
Dimitri Meunier
Arthur Gretton
Taiji Suzuki
Zhu Li
275
3
0
10 Jan 2025
Two-way Deconfounder for Off-policy Evaluation in Causal Reinforcement
  Learning
Two-way Deconfounder for Off-policy Evaluation in Causal Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2024
Shuguang Yu
Shuxing Fang
Ruixin Peng
Zhengling Qi
Fan Zhou
C. Shi
CMLOffRL
387
8
0
08 Dec 2024
Causality for Large Language Models
Causality for Large Language Models
Anpeng Wu
Kun Kuang
Minqin Zhu
Yingrong Wang
Yujia Zheng
Kairong Han
Yangqiu Song
Guangyi Chen
Leilei Gan
Kun Zhang
LRM
399
20
0
20 Oct 2024
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is
  Implicitly an Adversarial Regularizer
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Zhihan Liu
Miao Lu
Shenao Zhang
Boyi Liu
Hongyi Guo
Yingxiang Yang
Jose H. Blanchet
Zhaoran Wang
432
93
0
26 May 2024
Learning Causal Dynamics Models in Object-Oriented Environments
Learning Causal Dynamics Models in Object-Oriented Environments
Zhongwei Yu
Jingqing Ruan
Dengpeng Xing
266
4
0
21 May 2024
Learning Decision Policies with Instrumental Variables through Double
  Machine Learning
Learning Decision Policies with Instrumental Variables through Double Machine LearningInternational Conference on Machine Learning (ICML), 2024
Daqian Shao
Ashkan Soleymani
Francesco Quinzan
Marta Z. Kwiatkowska
586
4
0
14 May 2024
On the Opportunities and Challenges of Offline Reinforcement Learning
  for Recommender Systems
On the Opportunities and Challenges of Offline Reinforcement Learning for Recommender Systems
Xiaocong Chen
Siyu Wang
Julian McAuley
Dietmar Jannach
Lina Yao
OffRL
300
18
0
22 Aug 2023
Causal Reinforcement Learning: A Survey
Causal Reinforcement Learning: A Survey
Zhi-Hong Deng
Jing Jiang
Guodong Long
Chen Zhang
CMLLRM
391
36
0
04 Jul 2023
A Unified Framework of Policy Learning for Contextual Bandit with
  Confounding Bias and Missing Observations
A Unified Framework of Policy Learning for Contextual Bandit with Confounding Bias and Missing Observations
Siyu Chen
Yitan Wang
Zhaoran Wang
Zhuoran Yang
OffRL
253
3
0
20 Mar 2023
Minimax Instrumental Variable Regression and $L_2$ Convergence
  Guarantees without Identification or Closedness
Minimax Instrumental Variable Regression and L2L_2L2​ Convergence Guarantees without Identification or ClosednessAnnual Conference Computational Learning Theory (COLT), 2023
Andrew Bennett
Nathan Kallus
Xiaojie Mao
Whitney Newey
Vasilis Syrgkanis
Masatoshi Uehara
340
18
0
10 Feb 2023
A Survey on Causal Reinforcement Learning
A Survey on Causal Reinforcement LearningIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2023
Yan Zeng
Ruichu Cai
Gang Hua
Libo Huang
Zijian Li
CML
542
70
0
10 Feb 2023
Robust Fitted-Q-Evaluation and Iteration under Sequentially Exogenous Unobserved Confounders
Robust Fitted-Q-Evaluation and Iteration under Sequentially Exogenous Unobserved Confounders
David Bruns-Smith
Angela Zhou
OffRL
701
14
0
01 Feb 2023
An Instrumental Variable Approach to Confounded Off-Policy Evaluation
An Instrumental Variable Approach to Confounded Off-Policy EvaluationInternational Conference on Machine Learning (ICML), 2022
Yang Xu
Jin Zhu
C. Shi
Shuang Luo
R. Song
OffRL
365
24
0
29 Dec 2022
Offline Reinforcement Learning for Human-Guided Human-Machine
  Interaction with Private Information
Offline Reinforcement Learning for Human-Guided Human-Machine Interaction with Private InformationManagement Sciences (MS), 2022
Zuyue Fu
Zhengling Qi
Zhuoran Yang
Zhaoran Wang
Lan Wang
OffRL
224
1
0
23 Dec 2022
Optimal Treatment Regimes for Proximal Causal Learning
Optimal Treatment Regimes for Proximal Causal LearningNeural Information Processing Systems (NeurIPS), 2022
Tao Shen
Yifan Cui
CML
446
5
0
19 Dec 2022
Instrumental Variables in Causal Inference and Machine Learning: A
  Survey
Instrumental Variables in Causal Inference and Machine Learning: A SurveyACM Computing Surveys (ACM CSUR), 2022
Anpeng Wu
Kun Kuang
Ruoxuan Xiong
Leilei Gan
SyDaCML
308
20
0
12 Dec 2022
Offline Policy Evaluation and Optimization under Confounding
Offline Policy Evaluation and Optimization under ConfoundingInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2022
Chinmaya Kausik
Yangyi Lu
Kevin Tan
Maggie Makar
Yixin Wang
Ambuj Tewari
OffRL
430
15
0
29 Nov 2022
Off-Policy Evaluation for Episodic Partially Observable Markov Decision
  Processes under Non-Parametric Models
Off-Policy Evaluation for Episodic Partially Observable Markov Decision Processes under Non-Parametric ModelsNeural Information Processing Systems (NeurIPS), 2022
Rui Miao
Zhengling Qi
Xiaoke Zhang
OffRL
352
15
0
21 Sep 2022
Statistical Estimation of Confounded Linear MDPs: An Instrumental
  Variable Approach
Statistical Estimation of Confounded Linear MDPs: An Instrumental Variable Approach
Miao Lu
Wenhao Yang
Liangyu Zhang
Zhihua Zhang
OffRL
248
1
0
12 Sep 2022
Strategic Decision-Making in the Presence of Information Asymmetry:
  Provably Efficient RL with Algorithmic Instruments
Strategic Decision-Making in the Presence of Information Asymmetry: Provably Efficient RL with Algorithmic Instruments
Mengxin Yu
Zhuoran Yang
Jianqing Fan
OffRL
363
9
0
23 Aug 2022
Future-Dependent Value-Based Off-Policy Evaluation in POMDPs
Future-Dependent Value-Based Off-Policy Evaluation in POMDPsNeural Information Processing Systems (NeurIPS), 2022
Masatoshi Uehara
Haruka Kiyohara
Andrew Bennett
Victor Chernozhukov
Nan Jiang
Nathan Kallus
C. Shi
Wen Sun
OffRL
509
25
0
26 Jul 2022
A Minimax Learning Approach to Off-Policy Evaluation in Confounded
  Partially Observable Markov Decision Processes
A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision ProcessesInternational Conference on Machine Learning (ICML), 2021
C. Shi
Masatoshi Uehara
Jiawei Huang
Nan Jiang
OffRL
441
31
0
12 Nov 2021
Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in
  Partially Observed Markov Decision Processes
Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in Partially Observed Markov Decision ProcessesOperational Research (OR), 2021
Andrew Bennett
Nathan Kallus
OffRL
262
59
0
28 Oct 2021
Instrument Space Selection for Kernel Maximum Moment Restriction
Instrument Space Selection for Kernel Maximum Moment Restriction
Rui Zhang
Krikamol Muandet
Bernhard Schölkopf
Masaaki Imaizumi
184
3
0
07 Jun 2021
On Instrumental Variable Regression for Deep Offline Policy Evaluation
On Instrumental Variable Regression for Deep Offline Policy EvaluationJournal of machine learning research (JMLR), 2021
Yutian Chen
Liyuan Xu
Çağlar Gülçehre
T. Paine
Arthur Gretton
Nando de Freitas
Arnaud Doucet
OffRL
339
24
0
21 May 2021
Estimating and Improving Dynamic Treatment Regimes With a Time-Varying
  Instrumental Variable
Estimating and Improving Dynamic Treatment Regimes With a Time-Varying Instrumental Variable
Shuxiao Chen
B. Zhang
339
26
0
15 Apr 2021
An Adaptive Stochastic Sequential Quadratic Programming with
  Differentiable Exact Augmented Lagrangians
An Adaptive Stochastic Sequential Quadratic Programming with Differentiable Exact Augmented LagrangiansMathematical programming (Math. Program.), 2021
Sen Na
M. Anitescu
Mladen Kolar
354
58
0
10 Feb 2021
Provably Efficient Causal Reinforcement Learning with Confounded
  Observational Data
Provably Efficient Causal Reinforcement Learning with Confounded Observational Data
Lingxiao Wang
Zhuoran Yang
Zhaoran Wang
OffRL
267
58
0
22 Jun 2020
Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement
  Learning Framework
Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement Learning FrameworkJournal of the American Statistical Association (JASA), 2020
C. Shi
Xiaoyu Wang
Shuang Luo
Hongtu Zhu
Jieping Ye
R. Song
CMLOffRL
671
51
0
05 Feb 2020
1
Page 1 of 1