ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2007.03438
  4. Cited By
Off-Policy Evaluation via the Regularized Lagrangian
v1v2 (latest)

Off-Policy Evaluation via the Regularized Lagrangian

7 July 2020
Mengjiao Yang
Ofir Nachum
Bo Dai
Lihong Li
Dale Schuurmans
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Off-Policy Evaluation via the Regularized Lagrangian"

50 / 80 papers shown
Semi-gradient DICE for Offline Constrained Reinforcement Learning
Woosung Kim
JunHo Seo
Jongmin Lee
Byung-Jun Lee
OffRL
181
0
0
10 Jun 2025
STITCH-OPE: Trajectory Stitching with Guided Diffusion for Off-Policy Evaluation
STITCH-OPE: Trajectory Stitching with Guided Diffusion for Off-Policy Evaluation
Hossein Goli
Michael Gimelfarb
Nathan Samuel de Lara
Haruki Nishimura
Masha Itkina
Florian Shkurti
OffRL
389
2
0
27 May 2025
An Optimal Discriminator Weighted Imitation Perspective for Reinforcement Learning
An Optimal Discriminator Weighted Imitation Perspective for Reinforcement LearningInternational Conference on Learning Representations (ICLR), 2025
Haoran Xu
Shuozhe Li
Harshit S. Sikchi
S. Niekum
Amy Zhang
OffRL
445
3
0
17 Apr 2025
Average-DICE: Stationary Distribution Correction by Regression
Average-DICE: Stationary Distribution Correction by Regression
Fengdi Che
Bryan Chan
Chen Ma
A. R. Mahmood
OffRL
236
0
0
03 Mar 2025
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
Jiawei Huang
Bingcong Li
Christoph Dann
Niao He
OffRL
671
4
0
26 Feb 2025
SimuDICE: Offline Policy Optimization Through World Model Updates and
  DICE Estimation
SimuDICE: Offline Policy Optimization Through World Model Updates and DICE Estimation
Catalin E. Brita
Stephan Bongers
F. Oliehoek
OffRL
330
0
0
09 Dec 2024
Hybrid Transfer Reinforcement Learning: Provable Sample Efficiency from
  Shifted-Dynamics Data
Hybrid Transfer Reinforcement Learning: Provable Sample Efficiency from Shifted-Dynamics DataInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Chengrui Qu
Laixi Shi
Kishan Panaganti
Pengcheng You
Adam Wierman
OffRLOnRL
303
6
0
06 Nov 2024
Off-Policy Selection for Initiating Human-Centric Experimental Design
Off-Policy Selection for Initiating Human-Centric Experimental DesignNeural Information Processing Systems (NeurIPS), 2024
Ge Gao
Xi Yang
Qitong Gao
Song Ju
Miroslav Pajic
Min Chi
OffRL
341
0
0
26 Oct 2024
Primal-Dual Spectral Representation for Off-policy Evaluation
Primal-Dual Spectral Representation for Off-policy EvaluationInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Yang Hu
Tianyi Chen
Na Li
Kai Wang
Bo Dai
OffRL
320
4
0
23 Oct 2024
Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation
Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function ApproximationInternational Conference on Machine Learning (ICML), 2024
Fengdi Che
Chenjun Xiao
Jincheng Mei
Bo Dai
Ramki Gummadi
Oscar A Ramirez
Christopher K Harris
A. R. Mahmood
Dale Schuurmans
528
7
0
31 May 2024
Kernel Metric Learning for In-Sample Off-Policy Evaluation of
  Deterministic RL Policies
Kernel Metric Learning for In-Sample Off-Policy Evaluation of Deterministic RL Policies
Haanvid Lee
Tri Wahyu Guntara
Jongmin Lee
Yung-Kyun Noh
Kee-Eung Kim
OffRL
259
3
0
29 May 2024
OPERA: Automatic Offline Policy Evaluation with Re-weighted Aggregates
  of Multiple Estimators
OPERA: Automatic Offline Policy Evaluation with Re-weighted Aggregates of Multiple Estimators
Allen Nie
Yash Chandak
Christina J. Yuan
Anirudhan Badrinath
Yannis Flet-Berliac
Emma Brunskil
OffRL
291
4
0
27 May 2024
Offline Multi-task Transfer RL with Representational Penalization
Offline Multi-task Transfer RL with Representational Penalization
Avinandan Bose
S. S. Du
Maryam Fazel
OffRL
334
13
0
19 Feb 2024
ODICE: Revealing the Mystery of Distribution Correction Estimation via
  Orthogonal-gradient Update
ODICE: Revealing the Mystery of Distribution Correction Estimation via Orthogonal-gradient Update
Liyuan Mao
Haoran Xu
Weinan Zhang
Xianyuan Zhan
377
23
0
01 Feb 2024
Probabilistic Offline Policy Ranking with Approximate Bayesian
  Computation
Probabilistic Offline Policy Ranking with Approximate Bayesian Computation
Longchao Da
Porter Jenkins
Trevor Schwantes
Jeffrey Dotson
Hua Wei
OffRL
245
3
0
17 Dec 2023
Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy
  Evaluation
Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy EvaluationInternational Conference on Learning Representations (ICLR), 2023
Haruka Kiyohara
Ren Kishimoto
K. Kawakami
Ken Kobayashi
Kazuhide Nakata
Yuta Saito
OffRL
508
15
0
30 Nov 2023
SCOPE-RL: A Python Library for Offline Reinforcement Learning and
  Off-Policy Evaluation
SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation
Haruka Kiyohara
Ren Kishimoto
K. Kawakami
Ken Kobayashi
Kazuhide Nakata
Yuta Saito
OffRLELM
534
5
0
30 Nov 2023
When is Off-Policy Evaluation Useful? A Data-Centric Perspective
When is Off-Policy Evaluation Useful? A Data-Centric Perspective
Hao Sun
Alex J. Chan
Nabeel Seedat
Alihan Huyuk
M. Schaar
ELMOffRL
331
0
0
23 Nov 2023
State-Action Similarity-Based Representations for Off-Policy Evaluation
State-Action Similarity-Based Representations for Off-Policy EvaluationNeural Information Processing Systems (NeurIPS), 2023
Brahma S. Pavse
Josiah P. Hanna
OffRL
299
4
0
27 Oct 2023
Off-Policy Evaluation for Human Feedback
Off-Policy Evaluation for Human FeedbackNeural Information Processing Systems (NeurIPS), 2023
Qitong Gao
Ge Gao
Juncheng Dong
Vahid Tarokh
Min Chi
Miroslav Pajic
OffRL
383
9
0
11 Oct 2023
High-probability sample complexities for policy evaluation with linear
  function approximation
High-probability sample complexities for policy evaluation with linear function approximationIEEE Transactions on Information Theory (IEEE Trans. Inf. Theory), 2023
Gen Li
Weichen Wu
Yuejie Chi
Cong Ma
Alessandro Rinaldo
Yuting Wei
OffRL
454
10
0
30 May 2023
A Unified Framework of Policy Learning for Contextual Bandit with
  Confounding Bias and Missing Observations
A Unified Framework of Policy Learning for Contextual Bandit with Confounding Bias and Missing Observations
Siyu Chen
Yitan Wang
Zhaoran Wang
Zhuoran Yang
OffRL
243
3
0
20 Mar 2023
Offline Imitation Learning with Suboptimal Demonstrations via Relaxed
  Distribution Matching
Offline Imitation Learning with Suboptimal Demonstrations via Relaxed Distribution MatchingAAAI Conference on Artificial Intelligence (AAAI), 2023
Lantao Yu
Tianhe Yu
Jiaming Song
Willie Neiswanger
Stefano Ermon
OffRL
277
28
0
05 Mar 2023
Hallucinated Adversarial Control for Conservative Offline Policy
  Evaluation
Hallucinated Adversarial Control for Conservative Offline Policy EvaluationConference on Uncertainty in Artificial Intelligence (UAI), 2023
Jonas Rothfuss
Bhavya Sukhija
Tobias Birchler
Parnian Kassraie
Andreas Krause
OffRL
274
14
0
02 Mar 2023
Distributional Offline Policy Evaluation with Predictive Error
  Guarantees
Distributional Offline Policy Evaluation with Predictive Error GuaranteesInternational Conference on Machine Learning (ICML), 2023
Runzhe Wu
Masatoshi Uehara
Wen Sun
OffRL
324
19
0
19 Feb 2023
Conservative State Value Estimation for Offline Reinforcement Learning
Conservative State Value Estimation for Offline Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2023
Liting Chen
Jie Yan
Zhengdao Shao
Lu Wang
Qingwei Lin
Saravan Rajmohan
Thomas Moscibroda
Dongmei Zhang
OffRL
241
10
0
14 Feb 2023
Offline Learning of Closed-Loop Deep Brain Stimulation Controllers for
  Parkinson Disease Treatment
Offline Learning of Closed-Loop Deep Brain Stimulation Controllers for Parkinson Disease TreatmentInternational Conference on Cyber-Physical Systems (ICCPS), 2023
Qitong Gao
Stephen L. Schimdt
Afsana Chowdhury
Guangyu Feng
Jennifer J. Peters
Katherine Genty
W. Grill
Dennis A. Turner
Miroslav Pajic
OffRL
372
15
0
05 Feb 2023
Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage
Offline Minimax Soft-Q-learning Under Realizability and Partial CoverageNeural Information Processing Systems (NeurIPS), 2023
Masatoshi Uehara
Nathan Kallus
Jason D. Lee
Wen Sun
OffRL
384
8
0
05 Feb 2023
Revisiting Bellman Errors for Offline Model Selection
Revisiting Bellman Errors for Offline Model SelectionInternational Conference on Machine Learning (ICML), 2023
Joshua P. Zitovsky
Daniel de Marchi
Rishabh Agarwal
Michael R. Kosorok University of North Carolina at Chapel Hill
OffRL
332
6
0
31 Jan 2023
Efficient Policy Evaluation with Offline Data Informed Behavior Policy
  Design
Efficient Policy Evaluation with Offline Data Informed Behavior Policy DesignInternational Conference on Machine Learning (ICML), 2023
Shuze Liu
Shangtong Zhang
OffRL
510
7
0
31 Jan 2023
Variational Latent Branching Model for Off-Policy Evaluation
Variational Latent Branching Model for Off-Policy EvaluationInternational Conference on Learning Representations (ICLR), 2023
Qitong Gao
Ge Gao
Min Chi
Miroslav Pajic
OffRL
402
7
0
28 Jan 2023
Off-Policy Evaluation for Action-Dependent Non-Stationary Environments
Off-Policy Evaluation for Action-Dependent Non-Stationary EnvironmentsNeural Information Processing Systems (NeurIPS), 2023
Yash Chandak
Shiv Shankar
Nathaniel D. Bastian
Bruno Castro da Silva
Emma Brunskil
Philip S. Thomas
OffRL
245
6
0
24 Jan 2023
Scaling Marginalized Importance Sampling to High-Dimensional
  State-Spaces via State Abstraction
Scaling Marginalized Importance Sampling to High-Dimensional State-Spaces via State AbstractionAAAI Conference on Artificial Intelligence (AAAI), 2022
Brahma S. Pavse
Josiah P. Hanna
OffRL
224
9
0
14 Dec 2022
A Review of Off-Policy Evaluation in Reinforcement Learning
A Review of Off-Policy Evaluation in Reinforcement Learning
Masatoshi Uehara
C. Shi
Nathan Kallus
OffRL
298
110
0
13 Dec 2022
When is Realizability Sufficient for Off-Policy Reinforcement Learning?
When is Realizability Sufficient for Off-Policy Reinforcement Learning?International Conference on Machine Learning (ICML), 2022
Andrea Zanette
OffRL
362
16
0
10 Nov 2022
Optimal Conservative Offline RL with General Function Approximation via
  Augmented Lagrangian
Optimal Conservative Offline RL with General Function Approximation via Augmented LagrangianInternational Conference on Learning Representations (ICLR), 2022
Paria Rashidinejad
Hanlin Zhu
Kunhe Yang
Stuart J. Russell
Jiantao Jiao
OffRL
465
34
0
01 Nov 2022
Beyond the Return: Off-policy Function Estimation under User-specified
  Error-measuring Distributions
Beyond the Return: Off-policy Function Estimation under User-specified Error-measuring DistributionsNeural Information Processing Systems (NeurIPS), 2022
Audrey Huang
Nan Jiang
OffRL
217
9
0
27 Oct 2022
A Unified Framework for Alternating Offline Model Training and Policy
  Learning
A Unified Framework for Alternating Offline Model Training and Policy LearningNeural Information Processing Systems (NeurIPS), 2022
Shentao Yang
Shujian Zhang
Yihao Feng
Mi Zhou
OffRL
321
17
0
12 Oct 2022
Inference on Strongly Identified Functionals of Weakly Identified
  Functions
Inference on Strongly Identified Functionals of Weakly Identified FunctionsAnnual Conference Computational Learning Theory (COLT), 2022
Andrew Bennett
Nathan Kallus
Xiaojie Mao
Whitney Newey
Vasilis Syrgkanis
Masatoshi Uehara
430
23
0
17 Aug 2022
Lagrangian Method for Q-Function Learning (with Applications to Machine
  Translation)
Lagrangian Method for Q-Function Learning (with Applications to Machine Translation)International Conference on Machine Learning (ICML), 2022
Bojun Huang
235
2
0
22 Jul 2022
Learning Bellman Complete Representations for Offline Policy Evaluation
Learning Bellman Complete Representations for Offline Policy EvaluationInternational Conference on Machine Learning (ICML), 2022
Jonathan D. Chang
Kaiwen Wang
Nathan Kallus
Wen Sun
OffRL
345
17
0
12 Jul 2022
Markovian Interference in Experiments
Markovian Interference in ExperimentsNeural Information Processing Systems (NeurIPS), 2022
Vivek F. Farias
Andrew A. Li
Tianyi Peng
Andrew Zheng
OffRL
199
45
0
06 Jun 2022
Hybrid Value Estimation for Off-policy Evaluation and Offline
  Reinforcement Learning
Hybrid Value Estimation for Off-policy Evaluation and Offline Reinforcement Learning
Xuefeng Jin
Xu-Hui Liu
Shengyi Jiang
Yang Yu
OffRL
304
4
0
04 Jun 2022
Variance Reduction based Partial Trajectory Reuse to Accelerate Policy
  Gradient Optimization
Variance Reduction based Partial Trajectory Reuse to Accelerate Policy Gradient Optimization
Hua Zheng
Wei Xie
311
2
0
06 May 2022
COptiDICE: Offline Constrained Reinforcement Learning via Stationary
  Distribution Correction Estimation
COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction EstimationInternational Conference on Learning Representations (ICLR), 2022
Jongmin Lee
Cosmin Paduraru
D. Mankowitz
N. Heess
Doina Precup
Kee-Eung Kim
A. Guez
OffRL
241
81
0
19 Apr 2022
Marginalized Operators for Off-policy Reinforcement Learning
Marginalized Operators for Off-policy Reinforcement LearningInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2022
Yunhao Tang
Mark Rowland
Rémi Munos
Michal Valko
OffRL
256
0
0
30 Mar 2022
Offline Reinforcement Learning Under Value and Density-Ratio
  Realizability: The Power of Gaps
Offline Reinforcement Learning Under Value and Density-Ratio Realizability: The Power of GapsConference on Uncertainty in Artificial Intelligence (UAI), 2022
Jinglin Chen
Nan Jiang
OffRL
431
38
0
25 Mar 2022
Bellman Residual Orthogonalization for Offline Reinforcement Learning
Bellman Residual Orthogonalization for Offline Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2022
Andrea Zanette
Martin J. Wainwright
OffRL
540
12
0
24 Mar 2022
DARA: Dynamics-Aware Reward Augmentation in Offline Reinforcement
  Learning
DARA: Dynamics-Aware Reward Augmentation in Offline Reinforcement LearningInternational Conference on Learning Representations (ICLR), 2022
Jinxin Liu
Hongyin Zhang
Xuetao Zhang
OffRL
226
47
0
13 Mar 2022
LobsDICE: Offline Learning from Observation via Stationary Distribution
  Correction Estimation
LobsDICE: Offline Learning from Observation via Stationary Distribution Correction EstimationNeural Information Processing Systems (NeurIPS), 2022
Geon-hyeong Kim
Jongmin Lee
Youngsoo Jang
Hongseok Yang
Kyungmin Kim
OffRL
349
27
0
28 Feb 2022
12
Next
Page 1 of 2