ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.01624
  4. Cited By
Off-Policy Evaluation via Off-Policy Classification
v1v2v3 (latest)

Off-Policy Evaluation via Off-Policy Classification

Neural Information Processing Systems (NeurIPS), 2019
4 June 2019
A. Irpan
Kanishka Rao
Konstantinos Bousmalis
Chris Harris
Julian Ibarz
Sergey Levine
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Off-Policy Evaluation via Off-Policy Classification"

38 / 38 papers shown
Few Dimensions are Enough: Fine-tuning BERT with Selected Dimensions Revealed Its Redundant Nature
Few Dimensions are Enough: Fine-tuning BERT with Selected Dimensions Revealed Its Redundant Nature
Shion Fukuhata
Yoshinobu Kano
219
1
0
07 Apr 2025
Clustering Context in Off-Policy Evaluation
Clustering Context in Off-Policy EvaluationInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2025
Daniel Guzman-Olivares
Philipp Schmidt
Jacek Golebiowski
Artur Bekasov
CMLOffRL
189
1
0
28 Feb 2025
Two-way Deconfounder for Off-policy Evaluation in Causal Reinforcement
  Learning
Two-way Deconfounder for Off-policy Evaluation in Causal Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2024
Shuguang Yu
Shuxing Fang
Ruixin Peng
Zhengling Qi
Fan Zhou
C. Shi
CMLOffRL
300
6
0
08 Dec 2024
Practical Performative Policy Learning with Strategic Agents
Practical Performative Policy Learning with Strategic Agents
Qianyi Chen
Ying Chen
Bo Li
577
2
0
02 Dec 2024
OMPO: A Unified Framework for RL under Policy and Dynamics Shifts
OMPO: A Unified Framework for RL under Policy and Dynamics Shifts
Yu-Juan Luo
Tianying Ji
Gang Hua
Jianwei Zhang
Huazhe Xu
Xianyuan Zhan
OffRL
292
4
0
29 May 2024
$\pi2\text{vec}$: Policy Representations with Successor Features
π2vec\pi2\text{vec}π2vec: Policy Representations with Successor FeaturesInternational Conference on Learning Representations (ICLR), 2023
Gianluca Scarpellini
Ksenia Konyushkova
Claudio Fantacci
T. Paine
Yutian Chen
Misha Denil
OffRL
199
1
0
16 Jun 2023
Pruning the Way to Reliable Policies: A Multi-Objective Deep Q-Learning
  Approach to Critical Care
Pruning the Way to Reliable Policies: A Multi-Objective Deep Q-Learning Approach to Critical CareIEEE journal of biomedical and health informatics (IEEE JBHI), 2023
Ali Shirali
Alexander Schubert
Ahmed Alaa
OffRL
242
8
0
13 Jun 2023
Deep RL at Scale: Sorting Waste in Office Buildings with a Fleet of
  Mobile Manipulators
Deep RL at Scale: Sorting Waste in Office Buildings with a Fleet of Mobile Manipulators
Alexander Herzog
Kanishka Rao
Karol Hausman
Yao Lu
Paul Wohlhart
...
Noah Brown
Mrinal Kalakrishnan
Julian Ibarz
P. Pastor
Sergey Levine
OffRL
232
36
0
05 May 2023
Rescue Conversations from Dead-ends: Efficient Exploration for
  Task-oriented Dialogue Policy Optimization
Rescue Conversations from Dead-ends: Efficient Exploration for Task-oriented Dialogue Policy OptimizationTransactions of the Association for Computational Linguistics (TACL), 2023
Yangyang Zhao
Zhenyu Wang
Mehdi Dastani
Shihan Wang
190
2
0
05 May 2023
Towards Real-World Applications of Personalized Anesthesia Using Policy
  Constraint Q Learning for Propofol Infusion Control
Towards Real-World Applications of Personalized Anesthesia Using Policy Constraint Q Learning for Propofol Infusion ControlIEEE journal of biomedical and health informatics (IEEE JBHI), 2023
Xiuding Cai
Jiao Chen
Yaoyao Zhu
Beiming Wang
Yu Yao
OffRL
255
13
0
17 Mar 2023
HOPE: Human-Centric Off-Policy Evaluation for E-Learning and Healthcare
HOPE: Human-Centric Off-Policy Evaluation for E-Learning and HealthcareAdaptive Agents and Multi-Agent Systems (AAMAS), 2023
Ge Gao
Song Ju
Markel Sanz Ausin
Min Chi
OffRL
199
8
0
18 Feb 2023
Revisiting Bellman Errors for Offline Model Selection
Revisiting Bellman Errors for Offline Model SelectionInternational Conference on Machine Learning (ICML), 2023
Joshua P. Zitovsky
Daniel de Marchi
Rishabh Agarwal
Michael R. Kosorok University of North Carolina at Chapel Hill
OffRL
277
5
0
31 Jan 2023
RT-1: Robotics Transformer for Real-World Control at Scale
RT-1: Robotics Transformer for Real-World Control at Scale
Anthony Brohan
Noah Brown
Justice Carbajal
Yevgen Chebotar
Joseph Dabis
...
Ted Xiao
Peng Xu
Sichun Xu
Tianhe Yu
Brianna Zitkovich
LM&Ro
500
1,749
0
13 Dec 2022
Policy-Adaptive Estimator Selection for Off-Policy Evaluation
Policy-Adaptive Estimator Selection for Off-Policy EvaluationAAAI Conference on Artificial Intelligence (AAAI), 2022
Takuma Udagawa
Haruka Kiyohara
Yusuke Narita
Yuta Saito
Keisuke Tateno
OffRL
214
27
0
25 Nov 2022
Offline Policy Comparison with Confidence: Benchmarks and Baselines
Offline Policy Comparison with Confidence: Benchmarks and Baselines
Anurag Koul
Mariano Phielipp
Alan Fern
OffRL
211
0
0
22 May 2022
Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement
  for Value Error
Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value ErrorInternational Conference on Machine Learning (ICML), 2022
Scott Fujimoto
David Meger
Doina Precup
Ofir Nachum
S. Gu
340
39
0
28 Jan 2022
Dynamics-Aware Comparison of Learned Reward Functions
Dynamics-Aware Comparison of Learned Reward FunctionsInternational Conference on Learning Representations (ICLR), 2022
Blake Wulfe
Ashwin Balakrishna
Logan Ellis
Jean Mercat
R. McAllister
Adrien Gaidon
131
15
0
25 Jan 2022
Validate on Sim, Detect on Real -- Model Selection for Domain
  Randomization
Validate on Sim, Detect on Real -- Model Selection for Domain RandomizationIEEE International Conference on Robotics and Automation (ICRA), 2021
Gal Leibovich
Guy Jacob
Shadi Endrawis
Gal Novik
Aviv Tamar
268
7
0
01 Nov 2021
Medical Dead-ends and Learning to Identify High-risk States and
  Treatments
Medical Dead-ends and Learning to Identify High-risk States and TreatmentsNeural Information Processing Systems (NeurIPS), 2021
Mehdi Fatemi
Taylor W. Killian
J. Subramanian
Marzyeh Ghassemi
OffRL
229
47
0
08 Oct 2021
Showing Your Offline Reinforcement Learning Work: Online Evaluation
  Budget Matters
Showing Your Offline Reinforcement Learning Work: Online Evaluation Budget MattersInternational Conference on Machine Learning (ICML), 2021
Vladislav Kurenkov
Sergey Kolesnikov
OffRL
337
24
0
08 Oct 2021
Model Selection for Offline Reinforcement Learning: Practical
  Considerations for Healthcare Settings
Model Selection for Offline Reinforcement Learning: Practical Considerations for Healthcare SettingsMachine Learning in Health Care (MLHC), 2021
Shengpu Tang
Jenna Wiens
OffRL
182
90
0
23 Jul 2021
Supervised Off-Policy Ranking
Supervised Off-Policy Ranking
Yue Jin
Yue Zhang
Tao Qin
Xudong Zhang
Jian Yuan
Houqiang Li
Tie-Yan Liu
OffRL
188
6
0
03 Jul 2021
Offline Policy Comparison under Limited Historical Agent-Environment
  Interactions
Offline Policy Comparison under Limited Historical Agent-Environment Interactions
Anton Dereventsov
Joseph Daws
Clayton Webster
OffRL
123
3
0
07 Jun 2021
Model Selection for Production System via Automated Online Experiments
Model Selection for Production System via Automated Online ExperimentsNeural Information Processing Systems (NeurIPS), 2021
Zhenwen Dai
Praveen Chandar
G. Fazelnia
Ben Carterette
M. Lalmas
212
5
0
27 May 2021
Benchmarks for Deep Off-Policy Evaluation
Benchmarks for Deep Off-Policy EvaluationInternational Conference on Learning Representations (ICLR), 2021
Justin Fu
Mohammad Norouzi
Ofir Nachum
George Tucker
Ziyun Wang
...
Yutian Chen
Aviral Kumar
Cosmin Paduraru
Sergey Levine
T. Paine
ELMOffRL
210
110
0
30 Mar 2021
Replacing Rewards with Examples: Example-Based Policy Search via
  Recursive Classification
Replacing Rewards with Examples: Example-Based Policy Search via Recursive ClassificationNeural Information Processing Systems (NeurIPS), 2021
Benjamin Eysenbach
Sergey Levine
Ruslan Salakhutdinov
OffRL
349
53
0
23 Mar 2021
Delayed Rewards Calibration via Reward Empirical Sufficiency
Delayed Rewards Calibration via Reward Empirical Sufficiency
Yixuan Liu
Hu Wang
Xiaowei Wang
Xiaoyue Sun
Liuyue Jiang
Minhui Xue
172
0
0
21 Feb 2021
Minimax Off-Policy Evaluation for Multi-Armed Bandits
Minimax Off-Policy Evaluation for Multi-Armed BanditsIEEE Transactions on Information Theory (IEEE Trans. Inf. Theory), 2021
Cong Ma
Banghua Zhu
Jiantao Jiao
Martin J. Wainwright
OffRL
179
11
0
19 Jan 2021
Offline Policy Selection under Uncertainty
Offline Policy Selection under UncertaintyInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2020
Mengjiao Yang
Bo Dai
Ofir Nachum
George Tucker
Dale Schuurmans
OffRL
219
35
0
12 Dec 2020
Predicting Sim-to-Real Transfer with Probabilistic Dynamics Models
Predicting Sim-to-Real Transfer with Probabilistic Dynamics Models
Lei M. Zhang
Matthias Plappert
Wojciech Zaremba
112
4
0
27 Sep 2020
Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible
  Off-Policy Evaluation
Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation
Yuta Saito
Shunsuke Aihara
Megumi Matsutani
Yusuke Narita
OffRL
676
88
0
17 Aug 2020
Hyperparameter Selection for Offline Reinforcement Learning
Hyperparameter Selection for Offline Reinforcement Learning
T. Paine
Cosmin Paduraru
Andrea Michi
Çağlar Gülçehre
Konrad Zolna
Alexander Novikov
Ziyun Wang
Nando de Freitas
GPOffRL
345
154
0
17 Jul 2020
Never Stop Learning: The Effectiveness of Fine-Tuning in Robotic
  Reinforcement Learning
Never Stop Learning: The Effectiveness of Fine-Tuning in Robotic Reinforcement Learning
Ryan Julian
Benjamin Swanson
Gaurav Sukhatme
Sergey Levine
Chelsea Finn
Karol Hausman
OnRLCLL
200
45
0
21 Apr 2020
Debiased Off-Policy Evaluation for Recommendation Systems
Debiased Off-Policy Evaluation for Recommendation SystemsACM Conference on Recommender Systems (RecSys), 2020
Yusuke Narita
Shota Yasui
Kohei Yata
OffRL
226
15
0
20 Feb 2020
Behavior Regularized Offline Reinforcement Learning
Behavior Regularized Offline Reinforcement Learning
Yifan Wu
George Tucker
Ofir Nachum
OffRL
478
770
0
26 Nov 2019
BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement
  Learning
BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2019
Xinyue Chen
Zijian Zhou
Liang Luo
Che Wang
Yanqiu Wu
George Andriopoulos
OffRL
333
135
0
27 Oct 2019
Ctrl-Z: Recovering from Instability in Reinforcement Learning
Ctrl-Z: Recovering from Instability in Reinforcement Learning
Vibhavari Dasagi
Jake Bruce
T. Peynot
Jurgen Leitner
141
11
0
09 Oct 2019
An Optimistic Perspective on Offline Reinforcement Learning
An Optimistic Perspective on Offline Reinforcement Learning
Rishabh Agarwal
Dale Schuurmans
Mohammad Norouzi
OffRLOnRL
452
71
0
10 Jul 2019
1
Page 1 of 1